But dropout layers usually create opposite effect making loss on evaluation less than loss during training.
Not necessarily! Although in dropout layer some of the neurons are dropped, but bear in mind that the output is scaled back according to dropout rate. In inference time (i.e. test time) dropout is removed entirely and considering that you have only trained your model for just one epoch, the behavior you saw may happen. Don't forget that since you are training the model for just one epoch, only a portion of neurons have been dropped in the dropout layer but all of them are present at inference time.
If you continue training the model for more epochs you might expect that the training loss and the test loss (on the same data) becomes more or less the same.
Experiment it yourself: just set the trainable
parameter of Dropout layer(s) to False
and see whether this happens or not.
One may be confused (as I was) by seeing that, after one epoch of training, the training loss is not equal to evaluation loss on the same batch of data. And this is not specific to models with Dropout
or BatchNormalization
layers. Consider this example:
from keras import layers, models
import numpy as np
model = models.Sequential()
model.add(layers.Dense(1000, activation='relu', input_dim=100))
model.add(layers.Dense(1))
model.compile(loss='mse', optimizer='adam')
x = np.random.rand(32, 100)
y = np.random.rand(32, 1)
print("Training:")
model.fit(x, y, batch_size=32, epochs=1)
print("\nEvaluation:")
loss = model.evaluate(x, y)
print(loss)
The output:
Training:
Epoch 1/1
32/32 [==============================] - 0s 7ms/step - loss: 0.1520
Evaluation:
32/32 [==============================] - 0s 2ms/step
0.7577340602874756
So why the losses are different if they have been computed over the same data, i.e. 0.1520 != 0.7577
?
If you ask this, it's because you, like me, have not paid enough attention: that 0.1520
is the loss before updating the parameters of model (i.e. before doing backward pass or backpropagation). And 0.7577
is the loss after the weights of model has been updated. Even though that the data used is the same, the state of the model when computing those loss values is not the same (Another question: so why has the loss increased after backpropagation? It is simply because you have only trained it for just one epoch and therefore the weights updates are not stable enough yet).
To confirm this, you can also use the same data batch as the validation data:
model.fit(x, y, batch_size=32, epochs=1, validation_data=(x,y))
If you run the code above with the modified line above you will get an output like this (obviously the exact values may be different for you):
Training:
Train on 32 samples, validate on 32 samples
Epoch 1/1
32/32 [==============================] - 0s 15ms/step - loss: 0.1273 - val_loss: 0.5344
Evaluation:
32/32 [==============================] - 0s 89us/step
0.5344240665435791
You see that the validation loss and evaluation loss are exactly the same: it is because the validation is performed at the end of epoch (i.e. when the model weights has already been updated).