I've created a program that allows me to create flexible Neural networks of any size/length, however I'm testing it using the simple structure of an XOR setup(Feed forward, Sigmoid activation, back propagation, no batching).
EDIT: The following is a completely new approach to my original question which didn't supply enough information
EDIT 2: I started my weight between -2.5 and 2.5, and fixed a problem in my code where I forgot some negatives. Now it either converges to 0 for all cases or 1 for all, instead of 0.5
Everything works exactly the way that I THINK it should, however it is converging toward 0.5, instead of oscillating between outputs of 0 and 1. I've completely gone through and hand calculated an entire setup of feeding forward/calculating delta errors/back prop./ etc. and it matched what I got from the program. I have also tried optimizing it by changing learning rate/ momentum, as well as increase complexity in the network(more neurons/layers).
Because of this, I assume that either one of my equations is wrong, or I have some other sort of misunderstanding in my Neural Network. The following is the logic with equations that I follow for each step:
I have an input layer with two inputs and a bias, a hidden with 2 neurons and a bias, and an output with 1 neuron.
- Take the input from each of the two input neurons and the bias neuron, then multiply them by their respective weights, and then add them together as the input for each of the two neurons in the hidden layer.
- Take the input of each hidden neuron, pass it through the Sigmoid activation function (Reference 1) and use that as the neuron's output.
- Take the outputs of each neuron in hidden layer (1 for the bias), multiply them by their respective weights, and add those values to the output neuron's input.
- Pass the output neuron's input through the Sigmoid activation function, and use that as the output for the whole network.
- Calculate the Delta Error(Reference 2) for the output neuron
- Calculate the Delta Error(Reference 3) for each of the 2 hidden neurons
- Calculate the Gradient(Reference 4) for each weight (starting from the end and working back)
- Calculate the Delta Weight(Reference 5) for each weight, and add that to its value.
- Start the process over with by Changing the inputs and expected output(Reference 6)
Here are the specifics of those references to equations/processes (This is probably where my problem is!):
- x is the input of the neuron:
(1/(1 + Math.pow(Math.E, (-1 * x))))
-1*(actualOutput - expectedOutput)*(Sigmoid(x) * (1 - Sigmoid(x))//Same sigmoid used in reference 1
SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to))
ParentNeuron.output * NeuronItConnectsTo.deltaError
learningRate*(weight.gradient) + momentum*(Previous Delta Weight)
- I have an arrayList with the values
0,1,1,0
in it in that order. It takes the first pair(0,1)
, and then expects a1
. For the second time through, it takes the second pair(1,1)
and expects a0
. It just keeps iterating through the list for each new set. Perhaps training it in this systematic way causes the problem?
Like I said before, they reason I don't think it's a code problem is because it matched exactly what I had calculated with paper and pencil (which wouldn't have happened if there was a coding error).
Also when I initialize my weights the first time, I give them a random double value between 0 and 1. This article suggests that that may lead to a problem: Neural Network with backpropogation not converging
Could that be it? I used the n^(-1/2) rule but that did not fix it.
If I can be more specific or you want other code let me know, thanks!