4

I've recently started implementing a feed-forward neural network and I'm using back-propagation as the learning method. I've been using http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html as a guide.

However, after just the first epoch, my error is 0. Before using the network for my real purpose I've tried with the simple network structure:

  • 4 binary inputs, 1, 1, 0, 0.
  • 2 hidden layers, 4 neurons each.
  • 1 output neuron, 1.0 should = valid input.

Each training epoch runs the test input (1, 1, 0, 0), calculates the output error (sigmoid derivative * (1.0 - sigmoid)), back propagates the error and finally adjusts the weights.

Each neuron's new weight = weight + learning_rate * the neuron's error * the input to the weight.

Each hidden neuron's error = (sum of all output neuron's error * connected weight) * the neuron's sigmoid derivative.

The issue is that my learning rate has to be 0.0001 for me to see any sort of 'progress' between the epochs in terms of lowering the error. In this case, the error starts around ~30.0. Any greater learning rate and the error results in 0 after the first pass, and thus results in false positives.

Also when I try this network with my real data (a set of 32 audio features from sample - 32 neurons per hidden layer) - I get the same issue. To the point where any noise will trigger a false positive. Possibly this could be an input feature issue, but as I'm testing using a high pitch note I can clearly see the raw data differs from a low pitch one.

I'm a neural networks newbie, so I'm almost positive the issue is with my network. Any help would be greatly appreciated.

jub
  • 213
  • 1
  • 11
  • 0.0001 is not necessarily _too low_ for a learning rate; sometimes you do just need a small learning rate to make progress. If you have not done so already, try normalising the data as this often allows one to increase the size of the learning rate. – Hungry Sep 12 '14 at 09:25

2 Answers2

0

Although stating that you are using a standard NN approach of feed-forward / backprop, you have not described how you have actually implemented this. You mention that you are using the "galaxy" link as a guide, but I notice that, on the "galaxy" page, there is no mention of bias being applied to the nodes. Perhaps you have not included this important component? There is a nice discussion of the role of bias applied to NN nodes by Nate Kohl, see Role of Bias in Neural Networks.

Next, rather than using two hidden layers, try using only one hidden layer. You may have to increase the number of nodes in that layer but, for most practical problems, you should be able to obtain a good solution with just one hidden layer. It will most likely be more stable and will certainly make it a lot easier for you to follow what is happening with backprop.

Community
  • 1
  • 1
TonyMorland
  • 103
  • 7
0

Well 0.0001 sounds reasonable to me. you might wiggle the other constants or seed the initial neural network weights with a different random set.

If your train data is ok, then its perfectly normal to do thousand or more irritations until you get the right neural network.

There are many techniques, to get faster to a end result. For example as activation function you might use TanH, or Relu You could also decrease from 0.001 towards 0.0001 over x epochs, or decrease based upon error rate.

Peter
  • 2,043
  • 1
  • 21
  • 45