6

I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.

But instead of converging, the it is diverging (the output is becoming infinity)

Here are some more details about what I've tried:

  • I had set the initial weights to 0, but since it was diverging I have randomized the initial weights
  • I read that a neural network might diverge if the learning rate is too high so I reduced the learning rate to 0.000001
  • The two functions I am trying to get it to add are: 3 * i + 7 * j+9 and j*j + i*i + 24 (I am giving the layer i and j as input)
  • I had implemented it as a single layer previously and that could approximate the polynomial functions better
  • I am thinking of implementing momentum in this network but I'm not sure it would help it learn
  • I am using a linear (as in no) activation function
  • There is oscillation in the beginning but the output starts diverging the moment any of weights become greater than 1

I have checked and rechecked my code but there doesn't seem to be any kind of issue with it.

So here's my question: what is going wrong here?

Any pointer will be appreciated.

Shayan RC
  • 3,152
  • 5
  • 33
  • 40
  • I am sure there is nothing wrong with the code. and I beleive the algorithm (backprop) is sufficiently well described for anyone familiar with neural networks. What I want to know is what else could have gone wrong.. I am removing the cuda tag to prevent misleading anyone into thinking this is a CUDA question – Shayan RC Aug 01 '13 at 05:16
  • 4
    From my experience, this can happen when your parameters overrun or some functions return infinity (the logarithm for example if you're using the logistic loss function). I would check for numerical problems first, e.g. with a gradient checker. But this is way too broad so I guess we can't help you with that. – Thomas Jungblut Aug 01 '13 at 06:41
  • The output does not become infinity suddenly but gradually after some initial oscillations. And it starts diverging after only after any of the weights become greater than one. So it is not due to any one function returning infinity. I have added some more information. Hope it helps. – Shayan RC Aug 02 '13 at 03:56
  • duplicate of https://cs.stackexchange.com/questions/13587/neural-network-diverging-instead-of-converging – badp Jun 13 '19 at 09:44

2 Answers2

4
  1. If the problem you are trying to solve is of classification type, try 3 layer network (3 is enough accordingly to Kolmogorov) Connections from inputs A and B to hidden node C (C = A*wa + B*wb) represent a line in AB space. That line divides correct and incorrect half-spaces. The connections from hidden layer to ouput, put hidden layer values in correlation with each other giving you the desired output.

  2. Depending on your data, error function may look like a hair comb, so implementing momentum should help. Keeping learning rate at 1 proved optimum for me.

  3. Your training sessions will get stuck in local minima every once in a while, so network training will consist of a few subsequent sessions. If session exceeds max iterations or amplitude is too high, or error is obviously high - the session has failed, start another.

  4. At the beginning of each, reinitialize your weights with random (-0.5 - +0.5) values.

  5. It really helps to chart your error descent. You will get that "Aha!" factor.

Lex Podgorny
  • 2,598
  • 1
  • 23
  • 40
  • 3
    Do you have a reference for the Kolmogorov rule on the number of layers? – Luis Jun 27 '16 at 18:20
  • 1
    @Luis my impression is that the 3 layer thing is outdated, given the advent of deep learning. – chris Jan 25 '18 at 00:56
  • 1
    @ChrisAnderson The 3 layer "thing" is not a "thing". Is a mathematical reflection and analysis on what you want to do, why and with which methods. Granted, you may put more and more layers (or lots of nodes, for that matter), but that doesn't guarantee that you're tackling your problem accordingly. The question about the number of layers remains interesting, especially for the original question: *I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.* – Luis Jan 26 '18 at 14:55
  • Oh, sorry. I think I've seen someone recommend something like that _in general_, which frustrated me. What you said makes sense (I didn't pay attention to the actual question text). – chris Jan 27 '18 at 17:17
4

The most common reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression.

another reason could be that there is a problem with the error expression used for calculating the gradients.

if these don't hold, then we need to see the code and answer.

sidquanto
  • 315
  • 3
  • 6