Dropout behavior in Keras with rate=1 (dropping all input units) not as expected

Question

input0 = keras.layers.Input((32, 32, 3), name='Input0')
flatten = keras.layers.Flatten(name='Flatten')(input0)
relu1 = keras.layers.Dense(256, activation='relu', name='ReLU1')(flatten)
dropout = keras.layers.Dropout(1., name='Dropout')(relu1)
softmax2 = keras.layers.Dense(10, activation='softmax', name='Softmax2')(dropout)
model = keras.models.Model(inputs=input0, outputs=softmax2, name='cifar')

just to test whether dropout is working..

I set dropout rate to be 1.0

the state in each epoch should be freezed without any tuning to parameters

however the accuracy keep growing although i drop all the hidden nodes

what's wrong?

It looks like there was a bug at some point in time that could explain this: https://github.com/tensorflow/tensorflow/issues/10845 . See if you have versions of tensorflow/keras in which this wasn't fixed yet? Otherwise, try printing values of various parameters, see if you can figure out which ones are still training. Maybe dropout does not affect bias weights for example? Then it could still learn something? Just a random idea, I'm not familiar with the implementation details — Dennis Soemers, Jan 20 '18 at 17:10
I can confirm the same behavior with Keras 2.0.9 & Tensorflow 1.3.0, as well as with the latest stable Keras 2.1.3 & Tensorflow 1.4.1, even with `use_bias=False` in the `Dense` layer. — desertnaut, Jan 20 '18 at 17:46
The `Dropout` layer simply doesn't do anything when `rate` is set to 1 (or 0, see [here](https://github.com/keras-team/keras/blob/master/keras/layers/core.py#L116)). I guess it's because the scaling factor in inverted dropout goes to infinity when `rate=1`. It'll be better to print a warning in this case, though. — Yu-Yang, Jan 20 '18 at 18:32

desertnaut · Accepted Answer · 2021-07-22T08:44:45.307

Nice catch!

It would seem that the issue linked in the comment above by Dennis Soemers, Keras Dropout layer changes results with dropout=0.0, has not been fully resolved, and it somehow blunders when faced with a dropout rate of 1.0 [see UPDATE at the end of post]; modifying the model shown in the Keras MNIST MLP example:

model = Sequential()
model.add(Dense(512, activation='relu', use_bias=False, input_shape=(784,)))
model.add(Dropout(1.0))
model.add(Dense(512, activation='relu'))
model.add(Dropout(1.0))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy',
          optimizer=RMSprop(),
          metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=128,
          epochs=3,
          verbose=1,
          validation_data=(x_test, y_test))

gives indeed a model being trained, despite all neurons being dropped, as you report:

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
60000/60000 [==============================] - 15s 251us/step - loss: 0.2180 - acc: 0.9324 - val_loss: 0.1072 - val_acc: 0.9654
Epoch 2/3
60000/60000 [==============================] - 15s 246us/step - loss: 0.0831 - acc: 0.9743 - val_loss: 0.0719 - val_acc: 0.9788
Epoch 3/3
60000/60000 [==============================] - 15s 245us/step - loss: 0.0526 - acc: 0.9837 - val_loss: 0.0997 - val_acc: 0.9723

Nevertheless, if you try a dropout rate of 0.99, i.e. replacing the two dropout layers in the above model with

model.add(Dropout(0.99))

then indeed you have effectively no training taking place, as it should be the case:

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
60000/60000 [==============================] - 16s 265us/step - loss: 3.4344 - acc: 0.1064 - val_loss: 2.3008 - val_acc: 0.1136
Epoch 2/3
60000/60000 [==============================] - 16s 261us/step - loss: 2.3342 - acc: 0.1112 - val_loss: 2.3010 - val_acc: 0.1135
Epoch 3/3
60000/60000 [==============================] - 16s 266us/step - loss: 2.3167 - acc: 0.1122 - val_loss: 2.3010 - val_acc: 0.1135

UPDATE (after comment by Yu-Yang in OP): It seems as a design choice (deal link now, see update below) not to do anything when the dropout rate is equal to either 0 or 1; the Dropout class becomes effective only

if 0. < self.rate < 1.

Nevertheless, as already commented, a warning message in such cases (and a relevant note in the documentation) would arguably be a good idea.

UPDATE (July 2021): There have been some changes since Jan 2018 when the answer was written; now, under the hood, Keras calls tf.nn.dropout, which does not seem to allow for dropout=1 (source).

The line of code Yu-Yang linked to in his comment to the question explains why this happens — Dennis Soemers, Jan 20 '18 at 18:43
@DennisSoemers indeed - saw it just after posting. I'll update the answer, but I think in any case a demonstration that it works as expected with a dropout rate high enough but still less than 1, such as 0.99, has some merit in itself... — desertnaut, Jan 20 '18 at 18:45
yes, of course. Just figured it'd be useful to also include the explanation in the answer — Dennis Soemers, Jan 20 '18 at 18:48
excuse me i need as i got to disable dropout as i have under fitting in results . i set dropout=0.0 but got over fitting after some epochs .. my data is large not simple .. can i try to set dropout=1 ? — user5520049, Jul 21 '21 at 16:10
@user5520049 there have been some changes since Jan 2018 when the answer was written; now, under the hood, [Keras calls `tf.nn.dropout`](https://github.com/keras-team/keras/blob/master/keras/layers/core/dropout.py#L116), which does not seem to allow for `dropout=1` ([source](https://github.com/tensorflow/tensorflow/blob/v2.5.0/tensorflow/python/ops/nn_ops.py#L5185)). If you have further issues, please open a new question with the details. — desertnaut, Jul 22 '21 at 08:42

Dropout behavior in Keras with rate=1 (dropping all input units) not as expected

1 Answers1

Linked