——— |
## Implementing a Neural Network in C
## John Bullinaria's Step by Step Guide to Implementing a Neural Network in C## By John A. Bullinaria from the School of Computer Science of The University of Birmingham, UK.This document contains a step by step guide to implementing a simple neural network in C. It is aimed mainly at students who wish to (or have been told to) incorporate a neural network learning component into a larger system they are building. Obviously there are many types of neural network one could consider using - here I shall concentrate on one particularly common and useful type, namely a simple three-layer feed-forward back-propagation network (multi layer perceptron). This type of network will be useful when we have a set of input vectors and a corresponding set
of output vectors, and our system must produce an appropriate output for each input it is given.
Of course, if we already have a complete noise-free set of input and output vectors, then a simple
look-up table would suffice. However, if we want the system to I shall assume that the reader is already familiar with C, and, for more details about neural
networks in general, simply refer the reader to the newsgroup A single neuron (i.e. processing unit) takes it total input Out = 1.0/(1.0 + exp(-In));
/* Out = Sigmoid(In) */ though other activation functions are often used (e.g. linear or hyperbolic tangent). This
has the effect of squashing the infinite range of Sigmoid_Derivative = Sigmoid * (1.0 - Sigmoid) ;
Layer2In = Weight[0] ;
/* start with the bias */ Normally layer 2 will have many units as well, so it is appropriate to write the weights
between unit
Layer2In[j] = Weight[0][j] ;
Remember that in C the array indices start from zero, not one, so we would declare our variables as
for( i = 1 ; i <= NumUnits1 ; i++ ) { Layer2In[j] += Layer1Out[i] * Weight[i][j] ; } Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ; double Layer1Out[NumUnits1+1] ;
(or, more likely, declare pointers and use double Layer2In[NumUnits2+1] ; double Layer2Out[NumUnits2+1] ; double Weight[NumUnits1+1][NumUnits2+1] ; calloc or malloc to allocate the memory).
Naturally, we need another loop to get all the layer 2 outputs
for( j = 1 ; j <= NumUnits2 ; j++ ) {
Layer2In[j] = Weight[0][j] ; for( i = 1 ; i <= NumUnits1 ; i++ ) { Layer2In[j] += Layer1Out[i] * Weight[i][j] ; } Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ; } Three layer networks are necessary and sufficient for most purposes, so our layer 2 outputs feed into a third layer in the same way as above
for( j = 1 ; j <= NumUnits2 ; j++ ) {
/* j loop computes layer 2 activations */
Layer2In[j] = Weight12[0][j] ; for( i = 1 ; i <= NumUnits1 ; i++ ) { Layer2In[j] += Layer1Out[i] * Weight12[i][j] ; } Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ; } for( k = 1 ; k <= NumUnits3 ; k++ ) { /* k loop computes layer 3 activations */ Layer3In[k] = Weight23[0][k] ; for( j = 1 ; j <= NumUnits2 ; j++ ) { Layer3In[k] += Layer2Out[j] * Weight23[j][k] ; } Layer3Out[k] = 1.0/(1.0 + exp(-Layer3In[k])) ; } The code can start to become confusing at this point - I find that keeping a separate
index Also, to save getting all the
for( j = 1 ; j <= NumHidden ; j++ ) {
/* j loop computes hidden unit activations */
SumH[j] = WeightIH[0][j] ; for( i = 1 ; i <= NumInput ; i++ ) { SumH[j] += Input[i] * WeightIH[i][j] ; } Hidden[j] = 1.0/(1.0 + exp(-SumH[j])) ; } for( k = 1 ; k <= NumOutput ; k++ ) { /* k loop computes output unit activations */ SumO[k] = WeightHO[0][k] ; for( j = 1 ; j <= NumHidden ; j++ ) { SumO[k] += Hidden[j] * WeightHO[j][k] ; } Output[k] = 1.0/(1.0 + exp(-SumO[k])) ; } Generally we will have a whole set of Input[p][i] , Target[p][k] labelled by the index Error = 0.0 ;
for( p = 1 ; p <= NumPattern ; p++ ) { for( k = 1 ; k <= NumOutput ; k++ ) { Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ; } } (The factor of 0.5 is conventionally included to simplify the algebra in deriving
the learning algorithm.) If we insert the above code for computing the network outputs
into the Error = 0.0 ; for( p = 1 ; p <= NumPattern ; p++ ) { /* p loop over training patterns */ for( j = 1 ; j <= NumHidden ; j++ ) { /* j loop over hidden units */ SumH[p][j] = WeightIH[0][j] ; for( i = 1 ; i <= NumInput ; i++ ) { SumH[p][j] += Input[p][i] * WeightIH[i][j] ; } Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ; } for( k = 1 ; k <= NumOutput ; k++ ) { /* k loop over output units */ SumO[p][k] = WeightHO[0][k] ; for( j = 1 ; j <= NumHidden ; j++ ) { SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ; } Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ; Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ; /* Sum Squared Error */ } } I'll leave the reader to dispense with any indices that they don't need for the
purposes of their own system (e.g. the indices on The next stage is to iteratively adjust the weights to minimize the network's
error. A standard way to do this is by 'gradient descent' on the error function.
We can compute how much the error is changed by a small change in each weight (i.e.
compute the partial derivatives d Error = 0.0 ; for( p = 1 ; p <= NumPattern ; p++ ) { /* repeat for all the training patterns */ for( j = 1 ; j <= NumHidden ; j++ ) { /* compute hidden unit activations */ SumH[p][j] = WeightIH[0][j] ; for( i = 1 ; i <= NumInput ; i++ ) { SumH[p][j] += Input[p][i] * WeightIH[i][j] ; } Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ; } for( k = 1 ; k <= NumOutput ; k++ ) { /* compute output unit activations and errors */ SumO[p][k] = WeightHO[0][k] ; for( j = 1 ; j <= NumHidden ; j++ ) { SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ; } Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ; Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ; DeltaO[k] = (Target[p][k] - Output[p][k]) * Output[p][k] * (1.0 - Output[p][k]) ; } for( j = 1 ; j <= NumHidden ; j++ ) { /* 'back-propagate' errors to hidden layer */ SumDOW[j] = 0.0 ; for( k = 1 ; k <= NumOutput ; k++ ) { SumDOW[j] += WeightHO[j][k] * DeltaO[k] ; } DeltaH[j] = SumDOW[j] * Hidden[p][j] * (1.0 - Hidden[p][j]) ; } for( j = 1 ; j <= NumHidden ; j++ ) { /* update weights WeightIH */ DeltaWeightIH[0][j] = eta * DeltaH[j] + alpha * DeltaWeightIH[0][j] ; WeightIH[0][j] += DeltaWeightIH[0][j] ; for( i = 1 ; i <= NumInput ; i++ ) { DeltaWeightIH[i][j] = eta * Input[p][i] * DeltaH[j] + alpha * DeltaWeightIH[i][j]; WeightIH[i][j] += DeltaWeightIH[i][j] ; } } for( k = 1 ; k <= NumOutput ; k ++ ) { /* update weights WeightHO */ DeltaWeightHO[0][k] = eta * DeltaO[k] + alpha * DeltaWeightHO[0][k] ; WeightHO[0][k] += DeltaWeightHO[0][k] ; for( j = 1 ; j <= NumHidden ; j++ ) { DeltaWeightHO[j][k] = eta * Hidden[p][j] * DeltaO[k] + alpha * DeltaWeightHO[j][k] ; WeightHO[j][k] += DeltaWeightHO[j][k] ; } } } (There is clearly plenty of scope for re-ordering, combining and simplifying the
loops here - I will leave that for the reader to do once they have understood what
the separate code sections are doing.) The weight changes The complete training process will consist of repeating the above weight updates
for a number of epochs (using another
for( epoch = 1 ; epoch < LARGENUMBER ; epoch++ ) {
/* ABOVE CODE FOR ONE ITERATION */
if( Error < SMALLNUMBER ) break ; } If the training patterns are presented in the same systematic order during each
epoch, it is possible for weight oscillations to occur. It is therefore generally
a good idea to use a new random order for the training patterns for each epoch. If
we put the
for( p = 1 ; p <= NumPattern ; p++ ) {
with
for( np = 1 ; np <= NumPattern ; np++ ) {
p = ranpat[np] ; Generating the random array
for( p = 1 ; p <= NumPattern ; p++ ) {
/* set up ordered array */
ranpat[p] = p ; } for( p = 1 ; p <= NumPattern ; p++) { /* swap random elements into each position */ np = p + rando() * ( NumPattern + 1 - p ) ; op = ranpat[p] ; ranpat[p] = ranpat[np] ; ranpat[np] = op ; } Naturally, one must set some initial network weights to start the learning
process. Starting all the weights at zero is generally not a good idea, as that
is often a local minimum of the error function. It is normal to initialize all
the weights with small random values. If
for( j = 1 ; j <= NumHidden ; j++ ) {
/* initialize WeightIH and DeltaWeightIH */
for( i = 0 ; i <= NumInput ; i++ ) { DeltaWeightIH[i][j] = 0.0 ; WeightIH[i][j] = 2.0 * ( rando() - 0.5 ) * smallwt ; } } for( k = 1 ; k <= NumOutput ; k ++ ) { /* initialize WeightHO and DeltaWeightHO */ for( j = 0 ; j <= NumHidden ; j++ ) { DeltaWeightHO[j][k] = 0.0 ; WeightHO[j][k] = 2.0 * ( rando() - 0.5 ) * smallwt ; } } Note, that it is a good idea to set all the initial We now have enough code to put together a working neural network program. I
have cut and pasted the above code into the file nn.c
(which your browser should allow you to save into your own file space). I have
added the standard #includes, declared all the variables, hard coded the standard
XOR training data and values for I've left plenty for the reader to do to convert this into a useful program, for example: eta, alpha, smallwt, NumHidden, etc.) to be varied during runtimeThere are also numerous network variations that could be implemented, for example:
Output[p][k] = SumO[p][k] ;
DeltaO[k] = Target[p][k] - Output[p][k] ;
Error -= ( Target[p][k] * log( Output[p][k] ) + ( 1.0 - Target[p][k] ) * log( 1.0 - Output[p][k] ) ) ;
DeltaO[k] = Target[p][k] - Output[p][k] ; |