Consider the following code snippet
model = models.Sequential()
model.add(layers.Dense(256, activation='relu')) # Layer 1
model.add(BatchNormalization())
model.add(layers.Dense(128, activation='relu')) # Layer 2
I am using Keras with Tensorflow backend.
My question is - Is BN performed before or after activation function in Keras's implementation?
To add more clarity,
Whether BN SHOULD be applied before or after activation is subject to debate, the original (Ioffe and Szegedy 2015) paper suggests "BEFORE", but comments from the below thread show diverse opinions. Ordering of batch normalization and dropout?
In Keras documentation (https://keras.io/layers/normalization/), it says "Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1."
Keras's doc seems to suggest that BN is applied AFTER activation (i.e. in the example code above, BN applied after 'relu' on layer 1). I would like to confirm if this is the case?
In addition, is it possible to configure whether BN is applied before or after activation function?
Thanks!