2

If there is a layer after batchnorm than we don't need bias term because of output of batchnorm is unbiased. Ok But if the sequence of layers is following:

... -> batchnorm -> relu -> convlayer

than output of relu is not normalized. Why it is still common not to include bias in that last layer?

Kunj Mehta
  • 411
  • 4
  • 11
qwenty
  • 83
  • 8
  • Clarification: Do you mean with _last layer_ the convlayer or the last layer of your net (which shouldn't be a convlayer)? – bene Jun 21 '19 at 11:23

1 Answers1

2

Addition of biases means an increase in the number of total parameters which can be a tricky thing in a large model and can affect convergence and learning rate.

"In a large model, removing the bias inputs makes very little difference because each node can make a bias node out of the average activation of all of its inputs, which by the law of large numbers will be roughly normal."

RElu = max(0,x) which itself adds a non-linearity to the model and hence bias can be a little unnecessary at this point, especially in a deep network. Adding bias further to that can also affect the variance of the model's output and may also lead to overfitting of the model.

Read this: Does bias in the convolutional layer really make a difference to the test accuracy?

and this: http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks

Sushant
  • 511
  • 2
  • 13
  • I think this is good too: https://stats.stackexchange.com/questions/185911/why-are-bias-nodes-used-in-neural-networks – bene Jun 21 '19 at 14:02