14

I have been following Andrew NG's videos on neural networks. In these videos, he doesn't associate a bias to each and every neuron. Instead, he adds a bias unit at the head of every layer after their activations have been computed and uses this bias along with the computations to calculate the activations of the next layer (forward propogation).

However, in some other blogs on machine learning and videos like this, there is a bias being associated with each neuron. What and why is this difference and what are it's implications?

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
RaviTej310
  • 1,635
  • 6
  • 25
  • 51

2 Answers2

10

Both approaches are representing the same bias concept. For each unit (excluding input nodes) you compute the value of activation function of a dot product of weights and activations from previous layers (in case of feed forward network) vectors plus scalar bias value :

 (w * a) + b

In Andrew Ng this value is computed using vectorisation trick in which you concatenate your activations with specified bias constant (usually 1) and that does the job (because this constant has its own weight for different nodes - so this is exactly the same to having another bias value for each node).

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • But in Andrew NG's course, if we add a single bias, won't all neurons in the next layer all have the same bias? This would not be the case if we initialized a bias for each neuron because we could initialize different biases for different neurons. – RaviTej310 May 13 '16 at 05:35
  • 5
    The bias value is the same - but every node has different weight for it. So if e.g. some node has a bias weight w_0 and bias constant is a_0 then corresponding bias value is equal to w_0 * a_0. You can adjust every bias value simply by learning a correct weight w_0. – Marcin Możejko May 13 '16 at 07:38
  • why must the bias unit be only added to the start of the neural network? I.e why must the one vector be added to the start. Why not the end? – chia yongkang Mar 06 '20 at 14:38
1

Regarding the differences between the two, @Marcin has answered them beautifully.

It's interesting that in his Deep Learning specialization by deeplearning.ai, Andrew takes a different approach from his Machine Learning course (where he took one bias term for every hidden layer) and associates a bias term with each associated neuron.

Though both the approaches try to achieve the same result, in my opinion, the one with associating a bias with each neuron is much more explicit and helps immensely with hyperparameter tuning, especially when you're dealing with large neural network architectures like CNN, Deep Neural Network, etc.

Aditya Saini
  • 675
  • 6
  • 14