8

EDIT2:

New training set...

Inputs:

[
 [0.0, 0.0], 
 [0.0, 1.0], 
 [0.0, 2.0], 
 [0.0, 3.0], 
 [0.0, 4.0], 
 [1.0, 0.0], 
 [1.0, 1.0], 
 [1.0, 2.0], 
 [1.0, 3.0], 
 [1.0, 4.0], 
 [2.0, 0.0], 
 [2.0, 1.0], 
 [2.0, 2.0], 
 [2.0, 3.0], 
 [2.0, 4.0], 
 [3.0, 0.0], 
 [3.0, 1.0], 
 [3.0, 2.0], 
 [3.0, 3.0], 
 [3.0, 4.0],
 [4.0, 0.0], 
 [4.0, 1.0], 
 [4.0, 2.0], 
 [4.0, 3.0], 
 [4.0, 4.0]
]

Outputs:

[
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [1.0], 
 [1.0], 
 [0.0], 
 [0.0], 
 [0.0], 
 [1.0], 
 [1.0]
]

EDIT1:

I have updated the question with my latest code. I fixed few minor issues but I am still getting the same output for all input combinations after the network has learned.

Here is the backprop algorithm explained: Backprop algorithm


Yes, this is a homework, to make this clear right at the beginning.

I am supposed to implement a simple backpropagation algorithm on a simple neural network.

I have chosen Python as a language of choice for this task and I have chosen a neural network like this:

3 layers: 1 input, 1 hidden, 1 output layer:

O         O

                    O

O         O

There is an integer on both inptut neurons and 1 or 0 on an output neuron.

Here is my entire implementation (a bit long). Bellow it I will choose just shorter relevant snippets where I think an error could be located at:

import os
import math
import Image
import random
from random import sample

#------------------------------ class definitions

class Weight:
    def __init__(self, fromNeuron, toNeuron):
        self.value = random.uniform(-0.5, 0.5)
        self.fromNeuron = fromNeuron
        self.toNeuron = toNeuron
        fromNeuron.outputWeights.append(self)
        toNeuron.inputWeights.append(self)
        self.delta = 0.0 # delta value, this will accumulate and after each training cycle used to adjust the weight value

    def calculateDelta(self, network):
        self.delta += self.fromNeuron.value * self.toNeuron.error

class Neuron:
    def __init__(self):
        self.value = 0.0        # the output
        self.idealValue = 0.0   # the ideal output
        self.error = 0.0        # error between output and ideal output
        self.inputWeights = []
        self.outputWeights = []

    def activate(self, network):
        x = 0.0;
        for weight in self.inputWeights:
            x += weight.value * weight.fromNeuron.value
        # sigmoid function
        if x < -320:
            self.value = 0
        elif x > 320:
            self.value = 1
        else:
            self.value = 1 / (1 + math.exp(-x))

class Layer:
    def __init__(self, neurons):
        self.neurons = neurons

    def activate(self, network):
        for neuron in self.neurons:
            neuron.activate(network)

class Network:
    def __init__(self, layers, learningRate):
        self.layers = layers
        self.learningRate = learningRate # the rate at which the network learns
        self.weights = []
        for hiddenNeuron in self.layers[1].neurons:
            for inputNeuron in self.layers[0].neurons:
                self.weights.append(Weight(inputNeuron, hiddenNeuron))
            for outputNeuron in self.layers[2].neurons:
                self.weights.append(Weight(hiddenNeuron, outputNeuron))

    def setInputs(self, inputs):
        self.layers[0].neurons[0].value = float(inputs[0])
        self.layers[0].neurons[1].value = float(inputs[1])

    def setExpectedOutputs(self, expectedOutputs):
        self.layers[2].neurons[0].idealValue = expectedOutputs[0]

    def calculateOutputs(self, expectedOutputs):
        self.setExpectedOutputs(expectedOutputs)
        self.layers[1].activate(self) # activation function for hidden layer
        self.layers[2].activate(self) # activation function for output layer        

    def calculateOutputErrors(self):
        for neuron in self.layers[2].neurons:
            neuron.error = (neuron.idealValue - neuron.value) * neuron.value * (1 - neuron.value)

    def calculateHiddenErrors(self):
        for neuron in self.layers[1].neurons:
            error = 0.0
            for weight in neuron.outputWeights:
                error += weight.toNeuron.error * weight.value
            neuron.error = error * neuron.value * (1 - neuron.value)

    def calculateDeltas(self):
        for weight in self.weights:
            weight.calculateDelta(self)

    def train(self, inputs, expectedOutputs):
        self.setInputs(inputs)
        self.calculateOutputs(expectedOutputs)
        self.calculateOutputErrors()
        self.calculateHiddenErrors()
        self.calculateDeltas()

    def learn(self):
        for weight in self.weights:
            weight.value += self.learningRate * weight.delta

    def calculateSingleOutput(self, inputs):
        self.setInputs(inputs)
        self.layers[1].activate(self)
        self.layers[2].activate(self)
        #return round(self.layers[2].neurons[0].value, 0)
        return self.layers[2].neurons[0].value


#------------------------------ initialize objects etc


inputLayer = Layer([Neuron() for n in range(2)])
hiddenLayer = Layer([Neuron() for n in range(100)])
outputLayer = Layer([Neuron() for n in range(1)])

learningRate = 0.5

network = Network([inputLayer, hiddenLayer, outputLayer], learningRate)

# just for debugging, the real training set is much larger
trainingInputs = [
    [0.0, 0.0],
    [1.0, 0.0],
    [2.0, 0.0],
    [0.0, 1.0],
    [1.0, 1.0],
    [2.0, 1.0],
    [0.0, 2.0],
    [1.0, 2.0],
    [2.0, 2.0]
]
trainingOutputs = [
    [0.0],
    [1.0],
    [1.0],
    [0.0],
    [1.0],
    [0.0],
    [0.0],
    [0.0],
    [1.0]
]

#------------------------------ let's train

for i in range(500):
    for j in range(len(trainingOutputs)):
        network.train(trainingInputs[j], trainingOutputs[j])
        network.learn()

#------------------------------ let's check


for pattern in trainingInputs:
    print network.calculateSingleOutput(pattern)

Now, the problem is that after learning the network seems to be returning a float number very close to 0.0 for all input combinations, even those that should be close to 1.0.

I train the network in 100 cycles, in each cycle I do:

For every set of inputs in the training set:

  • Set network inputs
  • Calculate outputs by using a sigmoid function
  • Calculate errors in the output layer
  • Calculate errors in the hidden layer
  • Calculate weights' deltas

Then I adjust the weights based on the learning rate and the accumulated deltas.

Here is my activation function for neurons:

def activationFunction(self, network):
    """
    Calculate an activation function of a neuron which is a sum of all input weights * neurons where those weights start
    """
    x = 0.0;
    for weight in self.inputWeights:
        x += weight.value * weight.getFromNeuron(network).value
    # sigmoid function
    self.value = 1 / (1 + math.exp(-x))

This how I calculate the deltas:

def calculateDelta(self, network):
    self.delta += self.getFromNeuron(network).value * self.getToNeuron(network).error

This is a general flow of my algorithm:

for i in range(numberOfIterations):
    for k,expectedOutput in trainingSet.iteritems():
        coordinates = k.split(",")
        network.setInputs((float(coordinates[0]), float(coordinates[1])))
        network.calculateOutputs([float(expectedOutput)])
        network.calculateOutputErrors()
        network.calculateHiddenErrors()
        network.calculateDeltas()
    oldWeights = network.weights
    network.adjustWeights()
    network.resetDeltas()
    print "Iteration ", i
    j = 0
    for weight in network.weights:
        print "Weight W", weight.i, weight.j, ": ", oldWeights[j].value, " ............ Adjusted value : ", weight.value
        j += j

The last two lines of the output are:

0.552785449458 # this should be close to 1
0.552785449458 # this should be close to 0

It actually returns the output number for all input combinations.

Am I missing something?

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Richard Knop
  • 81,041
  • 149
  • 392
  • 552
  • 3
    I think you are going to have to do some more work yourself -- this is more code than you can reasonably expect people to debug for you. Add `logging.log` statements in all important places to trace the weights of the edges and work through the numerics with a calculator for a few steps to see where they disagree. – Katriel Oct 21 '10 at 14:06
  • Read this: http://stackoverflow.com/questions/3704570/in-python-small-floats-tending-to-zero. For Bayseian filters, this is a standard problem, with a standard solution. You seem to have the same standard problem with very, very small floats. – S.Lott Oct 21 '10 at 16:00
  • @katrielalex Yeah I will continue working on this as well, of course. – Richard Knop Oct 21 '10 at 22:29
  • 1
    @S.Lott: the problem can't come from there, as the OP already use logarithms for weights, that's why `math.exp` is necessary. That leads to another problem : python raise an exception when x becomes too small or too large, but that is not related to the observed bogus behavior (just a plain old bug). – kriss Oct 22 '10 at 00:13
  • 1
    Just add: `self.layers[2].runActivationFunctionForAllNeurons(self)` in `calculateSingleOutput` and it will work. But beside bugfixes, convergence is less good than the first version after your edit, which is surprising. I do not see which change has this effect. – kriss Oct 22 '10 at 01:02
  • Yep, it works now. I added self.layers[1].runActivationFunctionForAllNeurons(self) to the calculateSingleOutput method. But it learns kinda slow. I was expecting a faster learning process. – Richard Knop Oct 22 '10 at 17:09
  • It slowed down because of the resetDelta method... I don't know why I added it there. It's gone now and it converges fast. – Richard Knop Oct 22 '10 at 17:17

1 Answers1

7

Looks like what you get is nearly the initial state of Neuron (nearly self.idealValue). Maybe you should not initialize this Neuron before having actual data to provide ?

EDIT: OK, I looked a bit deeper in the code and simplified it a bit (will post simplified version below). Basically your code has two minor errors (looks like things you just overlooked), but that leads to a network that definitely won't work.

  • you forgot to set value of expectedOutput in output layer while in learning phase. Without that the network definitely can't learn anything and will always be stuck with initial idealValue. (That is the bahavior that I spotted at first reading). This one could even have been spotted in your description of the training steps (and probably would have if you hadn't posted the code, this is one of the rare case I know where actually posting the code was hiding the error instead of making it obvious). You fixed this one after your EDIT1.
  • when activating network in calculateSingleOutputs, you forgot to activate the hidden layer.

Obviously any of these two problems will lead to a disfonctional network.

Once corrected, it works (well, it does in my simplified version of your code).

The errors were not easy to spot because the initial code was much too complicated. You should think twice before introducing new classes or new methods. Not creating enough methods or classes will make code hard to read and to maintain, but creating too many may make it even harder to read and maintain. You have to find the right balance. My personal method to find this balance is to follow code smells and refactoring techniques wherever they lead me. Sometimes adding methods or creating classes, sometimes removing them. It's certainly not perfect but that's what works for me.

Below is my version of code after some refactoring applied. I spent about one hour changing your code but always keeping it functionaly equivalent. I took that as a good refactoring exercise, as the initial code was really horrible to read. After refactoring it just took 5 minutes to spot the problems.

import os
import math

"""
A simple backprop neural network. It has 3 layers:
    Input layer: 2 neurons
    Hidden layer: 2 neurons
    Output layer: 1 neuron
"""

class Weight:
    """
    Class representing a weight between two neurons
    """
    def __init__(self, value, from_neuron, to_neuron):
        self.value = value
        self.from_neuron = from_neuron
        from_neuron.outputWeights.append(self)
        self.to_neuron = to_neuron
        to_neuron.inputWeights.append(self)

        # delta value, this will accumulate and after each training cycle
        # will be used to adjust the weight value
        self.delta = 0.0

class Neuron:
    """
    Class representing a neuron.
    """
    def __init__(self):
        self.value = 0.0        # the output
        self.idealValue = 0.0   # the ideal output
        self.error = 0.0        # error between output and ideal output
        self.inputWeights = []    # weights that end in the neuron
        self.outputWeights = []  # weights that starts in the neuron

    def activate(self):
        """
        Calculate an activation function of a neuron which is 
        a sum of all input weights * neurons where those weights start
        """
        x = 0.0;
        for weight in self.inputWeights:
            x += weight.value * weight.from_neuron.value
        # sigmoid function
        self.value = 1 / (1 + math.exp(-x))

class Network:
    """
    Class representing a whole neural network. Contains layers.
    """
    def __init__(self, layers, learningRate, weights):
        self.layers = layers
        self.learningRate = learningRate    # the rate at which the network learns
        self.weights = weights

    def training(self, entries, expectedOutput):
        for i in range(len(entries)):
            self.layers[0][i].value = entries[i]
        for i in range(len(expectedOutput)):
            self.layers[2][i].idealValue = expectedOutput[i]
        for layer in self.layers[1:]:
            for n in layer:
                n.activate()
        for n in self.layers[2]:
            error = (n.idealValue - n.value) * n.value * (1 - n.value)
            n.error = error
        for n in self.layers[1]:
            error = 0.0
            for w in n.outputWeights:
                error += w.to_neuron.error * w.value
            n.error = error
        for w in self.weights:
            w.delta += w.from_neuron.value * w.to_neuron.error

    def updateWeights(self):
        for w in self.weights:
            w.value += self.learningRate * w.delta

    def calculateSingleOutput(self, entries):
        """
        Calculate a single output for input values.
        This will be used to debug the already learned network after training.
        """
        for i in range(len(entries)):
            self.layers[0][i].value = entries[i]
        # activation function for output layer
        for layer in self.layers[1:]:
            for n in layer:
                n.activate()
        print self.layers[2][0].value


#------------------------------ initialize objects etc

neurons = [Neuron() for n in range(5)]

w1 = Weight(-0.79, neurons[0], neurons[2])
w2 = Weight( 0.51, neurons[0], neurons[3])
w3 = Weight( 0.27, neurons[1], neurons[2])
w4 = Weight(-0.48, neurons[1], neurons[3])
w5 = Weight(-0.33, neurons[2], neurons[4])
w6 = Weight( 0.09, neurons[3], neurons[4])

weights = [w1, w2, w3, w4, w5, w6]
inputLayer  = [neurons[0], neurons[1]]
hiddenLayer = [neurons[2], neurons[3]]
outputLayer = [neurons[4]]
learningRate = 0.3
network = Network([inputLayer, hiddenLayer, outputLayer], learningRate, weights)

# just for debugging, the real training set is much larger
trainingSet = [([0.0,0.0],[0.0]),
               ([1.0,0.0],[1.0]),
               ([2.0,0.0],[1.0]),
               ([0.0,1.0],[0.0]),
               ([1.0,1.0],[1.0]),
               ([2.0,1.0],[0.0]),
               ([0.0,2.0],[0.0]),
               ([1.0,2.0],[0.0]),
               ([2.0,2.0],[1.0])]

#------------------------------ let's train
for i in range(100): # training iterations
    for entries, expectedOutput in trainingSet:
        network.training(entries, expectedOutput)
    network.updateWeights()

#network has learned, let's check
network.calculateSingleOutput((1, 0)) # this should be close to 1
network.calculateSingleOutput((0, 0)) # this should be close to 0

By the way there is still a third problem I didn't corrected (but easy to correct). If x is too big or too small (> 320 or < -320) math.exp() will raise an exception. This will occur if you apply for training iterations, say a few thousands. The most simple way to correct that I see is to check for value of x and if it's too big or too small set Neuron's value to 0 or 1 depending on the case, which is the limit value.

kriss
  • 23,497
  • 17
  • 97
  • 116
  • Well, I will try that tomorrow. – Richard Knop Oct 21 '10 at 20:28
  • Thanks very much. Yeah, I guess I overcomplicated it. I just wanted to avoid procedural programming and do everything in OOP and I got carried away. – Richard Knop Oct 22 '10 at 07:40
  • By the way, try print network.calculateSingleOutput(2.0, 1.0). it will print incorrect output :) – Richard Knop Oct 23 '10 at 22:27
  • @Richard Knop: do you mean `network.calculateSingleOutput([2.0, 1.0])` ? (the entries parameter expect only one input that is a list of two numbers, your version will give a syntax error). With 100 learning iterations it yield: 0.04, not exactly zero as expected, but neither something I would call incorrect, it's still close to zero. – kriss Oct 24 '10 at 02:35
  • @Richard Knop: OK, I got it. It works with my version, not with yours. I guess it's something that changed with EDIT1, as I refactored from the initial version. The reason of the problem is (again) not obvious to me, you will have to check the difference by yourself. – kriss Oct 24 '10 at 02:41
  • Thanks. I will go line by line and try to find the difference. – Richard Knop Oct 24 '10 at 10:46
  • But still I think your version is not working correctly. This should return 1 network.calculateSingleOutput((1, 1)) as should this network.calculateSingleOutput((2, 2)). – Richard Knop Oct 24 '10 at 10:50
  • But anyways, I got mine code working, I am doing some refactoring now. I will post the final version later. – Richard Knop Oct 24 '10 at 12:43
  • @Richard Knop: yes, there is something wrong in my version. I will also check and post corrected version. – kriss Oct 24 '10 at 13:06
  • Damn. I've found out my code does not work correctly either :) I will have to investigate where is an error in my code. It does the same incorrect behavior as yours. – Richard Knop Oct 31 '10 at 14:07
  • @kriss Any luck? :) I've found out my code is the same as yours. It does not work for all patterns. There must be some mistake. – Richard Knop Nov 01 '10 at 21:30
  • @Richard Knop: I will have a look at the code again. Was busy on something else these days. No surprise our two versions have the same behavior, mine is supposed to be nothing but a refactoring of yours, so no behavior change was expected (but for the two fixes I already spoke of). I will look for another remaining problem. – kriss Nov 01 '10 at 22:47
  • By the way, I have looked at the Wikipedia page - http://en.wikipedia.org/wiki/Backpropagation - and on the bottom there are links to different implementations (C, C++, Python, Ruby, PHP, C#, Java). I have just tried to use my training set for the Python implementation from the Wikipedia - http://arctrix.com/nas/python/bpnn.py - and it still got the wrong output for one input pattern from my training set :P – Richard Knop Nov 01 '10 at 23:00
  • @Richard Knop: I wonder if it's really an error. The convergence of backpropagation algorithm is not guaranteed, and with your test set we could be in a non convergent case. That would also explain that other implementation have the same behavior. Basically neural networks behave well when there are some hidden rules. That is true when first input parameter is 0 (we always give 0 as result) you can try the network, this rule works with any other non provided value like `[0.0, 77.0] -> 0`, but it is not obvious there is some other hidden rule. Network may have hard time to converge. – kriss Nov 01 '10 at 23:54
  • @kriss Hmm. Well, now I am trying a new training set (check my updated question) and it's still not working correctly. There are multiple incorrect outputs. – Richard Knop Nov 02 '10 at 01:09
  • @Richard Knop: Well, the new training set is even more complex than the previous one. Could be that your training sets are just too complex for only two neurons in hidden layer ? Looks easy to try with three neurons in hidden layer. – kriss Nov 02 '10 at 02:41
  • @kriss Ok, I will try that. I will try using 100 neurons in hidden layer if it will help. – Richard Knop Nov 02 '10 at 12:00
  • @Richard Knop: another point to see is the use of a sigmoid for activation, networks with hidden layers usually use gaussian (the wanted effect is that some neurons are activated by specific values), I don't know if the sigmoid give as good results. – kriss Nov 02 '10 at 16:08
  • @kriss I tried it with 100 neurons in hidden layer and still it cannot get all patterns right even with my original small training set. I go update my original question with my latest code. – Richard Knop Nov 02 '10 at 19:46
  • @Richard Knop: if it can help, I tried the python implementation from wikipedia with your small training set and with 4 hidden neurons input [2, 1] give -0.33 instead of 1 (but it does not seem it get any better, even with more neurons, and we still hope for 0). – kriss Nov 03 '10 at 01:30
  • @kriss. Well, actually the training set I am supposed to use is a 100x100 black/white gif image so that is 10000 input patterns. The problem is the larger training set I use the less accurate it gets. With 250 input patterns after learning the network returns 0 for all inputs. All the time. So something is definitely wrong. I tried using 4 hidden neurons, 10, 20, 100, 200 and so on and it does not get more accurate. – Richard Knop Nov 03 '10 at 18:34
  • @kriss I guess I might just start a new question and I will try to choose different tags to attract more people :P – Richard Knop Nov 03 '10 at 18:36
  • @Richard Knop: Yes, there is definitely something that's going wrong. But I do no see what either. I would bet for something at conceptual level, not implementation, maybe the formula used for computing errors, or deltas ? And it definitely should be able to converge with the small learning set after adding a few nodes, so no wonder the bigger exemple does work work either. Opening a new question looks like a good idea. I mau also open one as I'm interrested to know if there is reference material on number of units necessary to find a solution when problems get more complex. – kriss Nov 03 '10 at 21:06