How to include batch normalization in non-sequential keras model

Question

I am new to DL and Keras. Currently I try to implement a Unet-like CNN and now I want to include batch normalization layers into my non-sequential model but do not really now how.

That is my current try to include it:

input_1 = Input((X_train.shape[1],X_train.shape[2], X_train.shape[3]))

conv1 = Conv2D(16, (3,3), strides=(2,2), activation='relu', padding='same')(input_1)
batch1 = BatchNormalization(axis=3)(conv1)
conv2 = Conv2D(32, (3,3), strides=(2,2), activation='relu', padding='same')(batch1)
batch2 = BatchNormalization(axis=3)(conv2)
conv3 = Conv2D(64, (3,3), strides=(2,2), activation='relu', padding='same')(batch2)
batch3 = BatchNormalization(axis=3)(conv3)
conv4 = Conv2D(128, (3,3), strides=(2,2), activation='relu', padding='same')(batch3)
batch4 = BatchNormalization(axis=3)(conv4)
conv5 = Conv2D(256, (3,3), strides=(2,2), activation='relu', padding='same')(batch4)
batch5 = BatchNormalization(axis=3)(conv5)
conv6 = Conv2D(512, (3,3), strides=(2,2), activation='relu', padding='same')(batch5)
drop1 = Dropout(0.25)(conv6)

upconv1 = Conv2DTranspose(256, (3,3), strides=(1,1), padding='same')(drop1)
upconv2 = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same')(upconv1)
upconv3 = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same')(upconv2)
upconv4 = Conv2DTranspose(32, (3,3), strides=(2,2), padding='same')(upconv3)
upconv5 = Conv2DTranspose(16, (3,3), strides=(2,2), padding='same')(upconv4)
upconv5_1 = concatenate([upconv5,conv2], axis=3)
upconv6 = Conv2DTranspose(8, (3,3), strides=(2,2), padding='same')(upconv5_1)
upconv6_1 = concatenate([upconv6,conv1], axis=3)
upconv7 = Conv2DTranspose(1, (3,3), strides=(2,2), activation='linear', padding='same')(upconv6_1)

model = Model(outputs=upconv7, inputs=input_1)

Is the batch normalization used in the right way? In the keras documentation I read that you typically want to normalize the "features axis"!? This is a short snippet out of the model summary:

====================================================================================================
input_1 (InputLayer)             (None, 512, 512, 9)   0
____________________________________________________________________________________________________
conv2d_1 (Conv2D)                (None, 256, 256, 16)  1312        input_1[0][0]
____________________________________________________________________________________________________
conv2d_2 (Conv2D)                (None, 128, 128, 32)  4640        conv2d_1[0][0]
____________________________________________________________________________________________________
conv2d_3 (Conv2D)                (None, 64, 64, 64)    18496       conv2d_2[0][0]
____________________________________________________________________________________________________

In this case my features axis is axis 3(start counting at 0), right? I read about discussions whether you should implement the batch normalization before or after the activation function. In this case it is used after the activation function, right? Is there a possibility to use it before the activation function?

Thank you very much for your help and feedback! Really appreciate it!

alliedtoasters · Accepted Answer · 2018-03-30T03:04:04.680

Part 1: Is the batch normalization used in the right way?

The way you've called the BatchNormalization layer is correct; axis=3 is what you want, as recommended by the documentation. Keep in mind that in the case of your model, axis=3 is equivalent to the default setting, axis=-1, so you do not need to call it explicitly.

Part 2: In this case it is used after the activation function, right? Is there a possibility to use it before the activation function?

Yes, batch normalization as defined in the 2014 research paper by Ioffe and Szegedy is intended for use after the activation layer as a means of reducing internal covariate shift. Your code correctly applies the batchnorm after the activations on your convolutional layers. Its use after the activation layer can be thought of as a "pre-processing step" for the information before it reaches the next layer as an input.

For that reason, batch normalization can also serve as a data pre-processing step, which you can use immediately after your input layer (as discussed in this response.) However, as that answer mentions, batchnorm should not be abused; it's computationally expensive and can force your model into approximately linear behavior (this answer goes into more detail about this issue).

Using batchnorm in some other step in the model (not after activation layer or input layer) would have poorly-understood effects on model performance; it's a process intended explicitly to be applied to the outputs of the activation layer.

In my experience with u-nets, I've had a lot of success applying batchnorm only after the convolutional layers before max pooling; this effectively doubles the computational "bang for my buck" on normalization, since these tensors are re-used in the u-net architecture. Aside from that, I don't use batchnorm (except maybe on the inputs if the mean pixel intensities per image are super heterogeneous.)

score -2 · Answer 2 · answered Aug 01 '19 at 02:16

-2

axis 3 = axis -1 which is the default parameter.

answered Aug 01 '19 at 02:16

Dacian Yu

1

3

Does this add anything more than the other answer? – Ignatius Aug 01 '19 at 02:41

How to include batch normalization in non-sequential keras model

2 Answers2