How BatchNormalization in keras works?

Question

I want to know how BatchNormalization works in keras, so I write the code:

X_input = keras.Input((2,))
X = keras.layers.BatchNormalization(axis=1)(X_input)
model1 = keras.Model(inputs=X_input, outputs=X)

the input is a batch of two dimenstions vector, and normalizing it along axis=1, then print the output:

a = np.arange(4).reshape((2,2))
print('a=')
print(a)
print('output=')
print(model1.predict(a,batch_size=2))

and the output is:

a=
array([[0, 1],
   [2, 3]])
output=
array([[ 0.        ,  0.99950039],
   [ 1.99900079,  2.9985013 ]], dtype=float32)

I can not figure out the results. As far as I know, the mean of the batch should be ([0,1] + [2,3])/2 = [1,2], the var is 1/2*(([0,1] - [1,2])^2 + ([2,3]-[1,2])^2) = [1,1]. Finally, normalizing it with (x - mean)/sqrt(var), therefore the results are [-1, -1] and [1,1], where am I wrong?

score 2 · Answer 1 · answered Nov 30 '17 at 07:41

BatchNormalization will substract the mean, divide by the variance, apply a factor gamma and an offset beta. If these parameters would actually be the mean and variance of your batch, the result would be centered around zero with variance 1.

But they are not. The keras BatchNormalization layer stores these as weights that can be trained, called moving_mean, moving_variance, beta and gamma. They are initialized as beta=0, gamma=1, moving_mean=0 and moving_variance=1. Since you don't have any train steps, BatchNorm does not change your values.

So, why don't you get exactly your input values? Because there is another parameter epsilon (a small number), which gets added to the variance. Therefore, all values are divided by 1+epsilon and end up a little bit below their input values.

I still don't understand why the input values do not change. In normalization, I get [-1, -1] and [1,1], even if it mutiply gamma and add beta, I can not get the same number. Also, I tried batch_size = 4 with 4 random input vectors and I still get the same number. — Sikai Yao, Nov 30 '17 at 10:28
BatchNormalization does not calculate mean and variance like you do, but these are (constant) parameters. Since you do not train this layer, they still have their initial value of mean=0 and variance=1. — YSelf, Nov 30 '17 at 10:39

How BatchNormalization in keras works?

1 Answers1

Linked