Weight Initialisation

Question

I plan to use the Nguyen-Widrow Algorithm for an NN with multiple hidden layers. While researching, I found a lot of ambiguities and I wish to clarify them.

The following is pseudo code for the Nguyen-Widrow Algorithm

      Initialize all weight of hidden layers with random values
      For each hidden layer{
          beta = 0.7 * Math.pow(hiddenNeurons, 1.0 / number of inputs);
          For each synapse{
             For each weight{
              Adjust weight by dividing by norm of weight for neuron and * multiplying by beta value
            }
          } 
      }

Just wanted to clarify whether the value of hiddenNeurons is the size of the particular hidden layer, or the size of all the hidden layers within the network. I got mixed up by viewing various sources.

In other words, if I have a network (3-2-2-2-3) (index 0 is input layer, index 4 is output layer), would the value hiddenNeurons be:

NumberOfNeuronsInLayer(1) + NumberOfNeuronsInLayer(2) + NumberOfNeuronsInLaer(3)

Or just

NumberOfNeuronsInLayer(i) , where i is the current Layer I am at

EDIT:

So, the hiddenNeurons value would be the size of the current hidden layer, and the input value would be the size of the previous hidden layer?

I think the equation relates to every hidden layer, that is number of hidden neurons is a number of neurons in i-th layer (can vary from one hidden layer to another), and number of inputs is number of inputs for the same i-th layer, that is number of neurons in previuos (i-1)-th layer. — Stan, Dec 03 '12 at 20:58
Im not sure to be honest, so many sources say different things, I am mixed up on how the algorithm has to be. Thanks for your reply though :) — Goaler444, Dec 03 '12 at 21:11
In fact, one hidden layer is enough in most cases, at least networks with larger number can be reduced to a 1 hidden layer analogue. This is why the formula is often described in terms of number of neurons in hidden layer and (its) inputs. When you add 2-d hidden layer, the formula applies to it recursively as well, where inputs are outputs of previous layer. — Stan, Dec 04 '12 at 18:20
So, the hiddenNeurons value would be the size of the current hidden layer, and the input value would be the size of the previous hidden layer? Thanks for your reply by the way :) — Goaler444, Dec 04 '12 at 18:27
hmm, im still not sure, despite the fact that most examples only have one hidden layer, when computing the weights between the hidden layer and the output layer, it seems that the input size used is that of the input layer. — Goaler444, Dec 04 '12 at 21:06

score 3 · Answer 1 · answered Dec 09 '12 at 01:46

3

The Nguyen-Widrow initialization algorithm is the following :

Initialize all weight of hidden layers with (ranged) random values
For each hidden layer
2.1 calculate beta value, 0.7 * Nth(#neurons of input layer) root of #neurons of current layer
2.2 for each synapse
2.1.1 for each weight
2.1.2 Adjust weight by dividing by norm of weight for neuron and multiplying by beta value

Encog Java Framework

answered Dec 09 '12 at 01:46

ThiS

947
8
16

Yea, I have viewed this pseudo code. So the number of neurons in the input layer is the layer of the first NN, and not the previous hidden layer? – Goaler444 Dec 09 '12 at 11:18
It's the previous layer's # of neurons (number of features for the very first hidden layer). – Greg Kramida Dec 17 '13 at 14:40

score 2 · Accepted Answer · answered Dec 12 '12 at 15:11

Sounds to me like you want more precise code. Here are some actual code lines from a project I'm participating to. Hope you read C. It's a bit abstracted and simplified. There is a struct nn, that holds the neural net data. You probably have your own abstract data type.

Code lines from my project (somewhat simplified):

float *w = nn->the_weight_array;
float factor = 0.7f * powf( (float) nn->n_hidden, 1.0f / nn->n_input);

for( w in all weight )
    *w++ = random_range( -factor, factor );

/* Nguyen/Widrow */
w = nn->the_weight_array;
for( i = nn->n_input; i; i-- ){
    _scale_nguyen_widrow( factor, w, nn->n_hidden );
    w += nn->n_hidden;
}

Functions called:

static void _scale_nguyen_widrow( float factor, float *vec, unsigned int size )
{
    unsigned int i;
    float magnitude = 0.0f;
    for ( i = 0; i < size; i++ )
        magnitude += vec[i] * vec[i];

    magnitude = sqrtf( magnitude );

    for ( i = 0; i < size; i++ )
         vec[i] *= factor / magnitude;
}

static inline float random_range( float min, float max)
{
    float range = fabs(max - min);
    return ((float)rand()/(float)RAND_MAX) * range + min;
}

Tip:
After you've implemented the Nguyen/Widrow weight initialization, you can actually add a little code line in the forward calculation that dumps each activation to a file. Then you can check how good the set of neurons hits the activation function. Find the mean and standard deviation. You can even plot it with a plotting tool, ie. gnuplot. (You need a plotting tool like gnuplot anyway for plotting error rates etc.) I did that for my implementation. The plots came out nice, and the initial learning became much faster using Nguyen/Widrow for my project.

PS: I'm not sure my implementation is correct according to Nguyen and Widrows intentions. I don't even think I care, as long as it does improve the initial learning.

Good luck,
-Øystein

Firstly, thanks for your reply. The algorithm goes a long those lines, but if I read your code correctly, your neural network only has one hidden layer. My question is how to use this Algorithm for neural networks that have more than one hidden layer. More specifically, when getting the size of the hidden layer for the algorithm (in this case your nn->n_hidden), does the value have to be the size of the current hidden layer at which I am at (ith layer) or the size of all the hidden layers combined? I am not looking for code, more for just clarification that I am using the algorithm correctly. — Goaler444, Dec 12 '12 at 18:20
Furthermore, the number of neurons in the input layer is the size of the first layer in the NN, and not the previous hidden layer? — Goaler444, Dec 12 '12 at 18:22
Right, my neural net has only one hidden layer. However, I would do the same thing for all hidden layers. Keep in mind that this is just an empirical technique in the first place, so it doesn't matter much which range you use in the next layer of hidden nodes. I would probably have used same factor for all layers. The important thing for you is to actually test. Please do what I describe in the tip section. Store the activation at each level to a file, and calculate the mean and standard deviation. I promise: That will shed some light on your initialization algorithm! — Øystein Schønning-Johansen, Dec 13 '12 at 10:28

Weight Initialisation

2 Answers2

Linked