12 August 2011

Neural Network (Part 5): The Back Propagation process


Back propagation is the process by which you move backwards through the neural network to adjust the weights and biases so as to reduce the total error of the network. The total error of the network is essentially the difference between the end results (actual Outputs) and the expected results. If you expected to get a result of 1, but instead got a 0:  you would go back through the network and tweak each of the weights (and bias) values so that your end result was a little bit closer to 1 than before.

The process of back-propagation is such that larger errors and larger weights and biases that create those errors are penalised more than their smaller counterparts. Bigger weights have a bigger influence on the final outcome than smaller weights, and are therefore penalised more for incorrect answers.

After many training cycles, the neural network reaches a stage of equilibrium (not quite, but close enough), whereby the tweaking is insignificant to the final outcome.

If you under-train, then you will get the wrong result more often than desired.
If you over-train, then the neural network will not be able to "think outside the sqaure", so to speak.

So how do you propagate backwards ??

Step 1:  Feed-forward pass through:
Send some data through the network to populate all the variables. This feed-forward pass allows you calculate your actualOUTPUTs, which you will use to compare against your expectedOUTPUTs.

Step 2: Calculate delta-error for the neurons in the last layer (output layer).
The delta-error calculation for the neuron(s) in the last layer of the neural network is a liitle bit different than the other layers. You can work this out once you calculate the actualOUTPUTs from the feedforward pass.

Let  Last Layer Neuron1.deltaError = LLN1.dE
       Last Layer.actualOutput1 = aO1            <--- This is the same as the Neuron1 Output Value
       Last Layer.expectedOutput1 = exO1

  •        LLN1.dE = (aO1) x (1-aO1) x (exO1 - aO1);

Once you have calculated the deltaError for every neuron in the last layer (output layer), you can move onto the next step.

Step 3: Calculate the delta-error for the hidden layer neurons

The hidden layers for this neural network, is any layer in the neural network, that is not the last layer. However, each layer should sit like ducks in a row. And we are now going to calculate the delta error for the second last layer in the neural network. This could in theory be the first layer in the network (if this network only had 2 layers).

HLN = Hidden Layer Neuron, 
LLN = Last Layer Neuron,

HLN.dE = (HLN.aO) x (1-HLN.aO) x (Sum of   [LLN.dE   x   LLN to HLN connection weight])

Keep moving back through the network layers until you reach the 1st layer (ie, you run out of layers).

Step 4: Update the weights of the connections and Bias of neuron.
a) Multiply the neuron's deltaError which was calculated in either step 2 or 3, by the learning rate (0.1), and by the connection's connEntry value.
b) Then add this calculated value (in Step (4a)) to the current weight of the connection.

neuron.connections[i].weight += (learningRate * neuron.connections[i].connEntry * neuron.deltaError);

The bias is like a connection with a constant connEntry of 1, therefore the calculation is

neuron.bias +=  (learningRate * 1 * neuron.deltaError);

Up Next: Neural Network (Part 6):

To go back to the table of contents click here


  1. Hi, I wonder that how do you denormalize the output values from 0 to 1? Thanks

    1. I am not sure what you mean. Please explain in more detail.

  2. why are you multiplying neuron.connections[i].connEntry when updating the weights in step 4?

    1. Good question - it has been a while since I have done this. I cannot remember the reason anymore. But you could try with and without, and see what difference it makes to the speed/effectiveness of learning. i.e. compare the two.

  3. do you think i could fit a program like this on an ATTiny85? (just two inputs, two outputs and 3 hidden nodes) I want my bot to be tiny but similar to this program i wrote: https://scratch.mit.edu/projects/120760080/, except i want it to learn instead of preset weights. thanks in advance!

    1. Hi Braden - you have a very cool project.

      I don't know how big your program is but you may find that you may struggle with the amount of memory available on the ATTiny85.
      I am not a programmer by trade, and so my program may very well have numerous inefficiencies, but I struggled to fit it onto an Arduino UNO, hence my reason for getting my computer to do the learning side, and leaving the Arduino with the preset weights... but you may be a more capable programmer and may figure out how to do it better than me... so all I can say is give a go and see what happens.

      You may be interested in seeing the difference between the atmega 328 and the attiny85 in terms of specifications:
      atmega328 vs attiny85 .


Feel free to leave a comment about this tutorial below.
Any questions about your particular project should be asked in the ArduinoBasics forum.

Comments are moderated due to large amount of spam.