I am new to DL and Keras. Currently I try to implement a Unet-like CNN and now I want to include batch normalization layers into my non-sequential model but do not really now how.
That is my current try to include it:
input_1 = Input((X_train.shape[1],X_train.shape[2], X_train.shape[3]))
conv1 = Conv2D(16, (3,3), strides=(2,2), activation='relu', padding='same')(input_1)
batch1 = BatchNormalization(axis=3)(conv1)
conv2 = Conv2D(32, (3,3), strides=(2,2), activation='relu', padding='same')(batch1)
batch2 = BatchNormalization(axis=3)(conv2)
conv3 = Conv2D(64, (3,3), strides=(2,2), activation='relu', padding='same')(batch2)
batch3 = BatchNormalization(axis=3)(conv3)
conv4 = Conv2D(128, (3,3), strides=(2,2), activation='relu', padding='same')(batch3)
batch4 = BatchNormalization(axis=3)(conv4)
conv5 = Conv2D(256, (3,3), strides=(2,2), activation='relu', padding='same')(batch4)
batch5 = BatchNormalization(axis=3)(conv5)
conv6 = Conv2D(512, (3,3), strides=(2,2), activation='relu', padding='same')(batch5)
drop1 = Dropout(0.25)(conv6)
upconv1 = Conv2DTranspose(256, (3,3), strides=(1,1), padding='same')(drop1)
upconv2 = Conv2DTranspose(128, (3,3), strides=(2,2), padding='same')(upconv1)
upconv3 = Conv2DTranspose(64, (3,3), strides=(2,2), padding='same')(upconv2)
upconv4 = Conv2DTranspose(32, (3,3), strides=(2,2), padding='same')(upconv3)
upconv5 = Conv2DTranspose(16, (3,3), strides=(2,2), padding='same')(upconv4)
upconv5_1 = concatenate([upconv5,conv2], axis=3)
upconv6 = Conv2DTranspose(8, (3,3), strides=(2,2), padding='same')(upconv5_1)
upconv6_1 = concatenate([upconv6,conv1], axis=3)
upconv7 = Conv2DTranspose(1, (3,3), strides=(2,2), activation='linear', padding='same')(upconv6_1)
model = Model(outputs=upconv7, inputs=input_1)
Is the batch normalization used in the right way? In the keras documentation I read that you typically want to normalize the "features axis"!? This is a short snippet out of the model summary:
====================================================================================================
input_1 (InputLayer) (None, 512, 512, 9) 0
____________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 256, 256, 16) 1312 input_1[0][0]
____________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 128, 32) 4640 conv2d_1[0][0]
____________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 64, 64, 64) 18496 conv2d_2[0][0]
____________________________________________________________________________________________________
In this case my features axis is axis 3(start counting at 0), right? I read about discussions whether you should implement the batch normalization before or after the activation function. In this case it is used after the activation function, right? Is there a possibility to use it before the activation function?
Thank you very much for your help and feedback! Really appreciate it!