Remove bias from the convolution if the convolution is followed by a normalization layer

Question

def __init__(self):
    super().__init__()

    self.conv = nn.Sequential(
        nn.Conv2d(32, 64, kernel_size=5, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),

        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        
        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),

        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),

        nn.Conv2d(64, 64, kernel_size=3, stride=2),
        nn.BatchNorm2d(64),

        nn.AvgPool2d()
    )

    conv_out_size = self._get_conv_out((32, 110, 110))

    self.fc = nn.Sequential(
        nn.Linear(conv_out_size, 1),
        nn.Sigmoid(),
    )

I have this model where everything to my eyes is fine. However, It says that I have to remove bias from the convolution if the convolution is followed by a normalization layer, because it already contains a parameter for the bias. Can you explain why and how I can do that?

fatima tasnim · Accepted Answer · 2020-11-16T22:38:26.040

2

Batch normalization = gamma * normalize(x) + bias So, using bias in convolution layer and then again in batch normalization will cancel out the bias in the process of mean subtraction.
You can just put bias = False in your convolution layer to ignore this conflict as the default value for bias is True in pytorch

edited Nov 16 '20 at 22:38

answered Nov 16 '20 at 21:58

fatima tasnim

48
5

For instance, `nn.Conv2d(32, 64, kernel_size=5, stride=2, bias=False)`. Is it what you mean? – David Nov 16 '20 at 22:05
Could we also remove the bias in fcc layers if they are followed by a BN layer? – ado sar Jun 04 '23 at 11:11

score 1 · Answer 2 · answered Nov 18 '20 at 03:12

The answer is already accepted but still, I would like to add a point here. One of the advantages of Batch Normalization is that it can be folded in a convolution layer. This means that we can replace the Convolution followed by the Batch Normalization operation with just one convolution with different weights. It is a good practice folding batch normalization and you can refer to the link here Folding Batch Norm.

I have also written some python script for your understanding. Kindly check this.

def fold_batch_norm(conv_layer, bn_layer):
"""Fold the batch normalization parameters into the weights for 
   the previous layer."""
conv_weights = conv_layer.get_weights()[0]

# Keras stores the learnable weights for a BatchNormalization layer
# as four separate arrays:
#   0 = gamma (if scale == True)
#   1 = beta (if center == True)
#   2 = moving mean
#   3 = moving variance
bn_weights = bn_layer.get_weights()
gamma = bn_weights[0]
beta = bn_weights[1]
mean = bn_weights[2]
variance = bn_weights[3]

epsilon = 1e-7
new_weights = conv_weights * gamma / np.sqrt(variance + epsilon)
param = conv_layer.get_config()

#Note that it will handle for all cases
if param['use_bias'] == True:
    bias = conv_layer.get_weights()[1]
    new_bias = beta + (bias - mean) * gamma / np.sqrt(variance + epsilon)
else:
    new_bias = beta - mean * gamma / np.sqrt(variance + epsilon)
return new_weights, new_bias

You can consider this idea in your future projects as well. Cheers :)

score 0 · Answer 3 · answered Sep 12 '22 at 23:52

0

If the pre-trained network doesn't have bias in conv2d layer [use_bias = false], folding batchnorm would require it to use bias.
Is there an easy way to change the use_bias config in pre-trained keras network ?

layer.set_weights(fold_batch_norm(..)) won't work since original weights didn't have bias.

answered Sep 12 '22 at 23:52

hmedu

13
5

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 20 '22 at 01:59

Remove bias from the convolution if the convolution is followed by a normalization layer

3 Answers3