Why we don't need bias in convolution layer after batchnorm and activation

Question

If there is a layer after batchnorm than we don't need bias term because of output of batchnorm is unbiased. Ok But if the sequence of layers is following:

... -> batchnorm -> relu -> convlayer

than output of relu is not normalized. Why it is still common not to include bias in that last layer?

Clarification: Do you mean with _last layer_ the convlayer or the last layer of your net (which shouldn't be a convlayer)? — bene, Jun 21 '19 at 11:23

score 2 · Accepted Answer · answered Jun 21 '19 at 13:06

Addition of biases means an increase in the number of total parameters which can be a tricky thing in a large model and can affect convergence and learning rate.

"In a large model, removing the bias inputs makes very little difference because each node can make a bias node out of the average activation of all of its inputs, which by the law of large numbers will be roughly normal."

RElu = max(0,x) which itself adds a non-linearity to the model and hence bias can be a little unnecessary at this point, especially in a deep network. Adding bias further to that can also affect the variance of the model's output and may also lead to overfitting of the model.

Read this: Does bias in the convolutional layer really make a difference to the test accuracy?

and this: http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks

I think this is good too: https://stats.stackexchange.com/questions/185911/why-are-bias-nodes-used-in-neural-networks — bene, Jun 21 '19 at 14:02

Why we don't need bias in convolution layer after batchnorm and activation

1 Answers1