1

I've gone through all my code and if it really is the problem then I'm not sure how it has eluded me. It's too long to post so I'll tell you my problem, what ive looked at for fixing and if you have any ideas what else I can search for I'd be very appreciative!

Ok, so firstly the weights are initialised with mean zero and variance equal to 1/ square root of the number of inputs for that neuron, as instructed by haykin.

I've fed it a simple sine wave to learn on first. The weights in the hidden layer seem to converge to give the same output for each neuron within that layer... which makes the output neuron give a nearly fixed output.

So, what could be the cause? Firstly I checked if the network learning rate was causing it to get stuck in local minima and increased it, and also tried with & without momentum. I found it rectified the problem somewhat, as the network DOES produce the sine wave. However, not properly! :(

The network output has an amplitude roughly a third the height from the centre axis up, and doesnt go below. It looks kind of like you've picked the sine wave up, squished it a third and raised it to sit with it's lowest peaks on the axis. Furthermore the top peaks are all flat...

I since tried changing the network topology, whereby if I add another hidden neuron (total 3) it suddenly only gives a fixed output.

Fred Johnson
  • 2,539
  • 3
  • 26
  • 52
  • While your code may be too long, without code, it's too hard to know what your problem is. Perhaps you could post a picture of your output plus the part of the code which you think is causing the problem. Is your activation function one with a limited domain? For example, Sigmoid output is limited to [0, 1]. It sounds like you may need to change your function or scale your data before feeding it into the network (and rescale on output). – Simon MᶜKenzie Apr 28 '13 at 23:58

1 Answers1

0

A sine wave is not an easy problem for a neural network with sigmoid activation functions. 3 hidden neurons are usually not enough. Take a look at this example: there are 200 hidden nodes to approximate a sine wave within [0, 2*pi]. The activation functions in this example are: logistic sigmoid in the hidden layer and identity in the output layer. You always should use identity in the output layer for regression.

When you do not get a good result it also might be benificial to decrease the learning rate. Sometimes gradient descent oscillates between steep regions of the error function because the learning rate is too big.

alfa
  • 3,058
  • 3
  • 25
  • 36