1

I'm trying to learn the inner workings of dropout regularization in NN. I'm largely working from "Deep Learning with Python" by Francois Chollet.

Say I'm using the IMDB movie review sentiment data and building a simple model like below:

# download IMDB movie review data
# keeping only the first 10000 most freq. occurring words to ensure manageble sized vectors
from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(
    num_words=10000)

# prepare the data
import numpy as np
# create an all 0 matrix of shape (len(sequences), dimension)
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        # set specific indices of results[i] = 1
        results[i, sequence] = 1.
    return results

# vectorize training data
x_train = vectorize_sequences(train_data)
# vectorize test data
x_test = vectorize_sequences(test_data)

# vectorize response labels
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

# build a model with L2 regularization
from keras import regularizers
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
                       activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
                       activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

The book gives an example of manually setting random dropout weights using the line below:

# at training time, zero out a random fraction of the values in the matrix
layer_output *= np.random.randint(0, high=2, size=layer_output.shape)

How would I 1) actually integrate that into my model and 2) how would I remove the dropout at test time?

EDIT: I'm aware of the integrated method of using dropout like the line below, I'm actually looking for a way to implement the above manually

model.add(layers.Dropout(0.5))
coolhand
  • 1,876
  • 5
  • 25
  • 46

2 Answers2

2

This can be implemented using a Lambda layer.

from keras import backend as K
def dropout(input):
    training = K.learning_phase()
    if training is 1 or training is True:
        input *= K.cast(K.random_uniform(K.shape(input), minval=0, maxval=2, dtype='int32'), dtype='float32')
        input /= 0.5    
    return input

def get_model():
        model = models.Sequential()
        model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
                               activation='relu', input_shape=(10000,)))
        model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
                               activation='relu'))
        model.add(layers.Lambda(dropout)) # add dropout using Lambda layer
        model.add(layers.Dense(1, activation='sigmoid'))
        print(model.summary())
        return model

K.set_learning_phase(1)
model = get_model()
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
weights = model.get_weights()
K.set_learning_phase(0)
model = get_model()
model.set_weights(weights)
print('model prediction is {}, label is {} '.format(model.predict(x_test[0][None]), y_test[0]))

model prediction is [[0.1484453]], label is 0.0

Manoj Mohan
  • 5,654
  • 1
  • 17
  • 21
  • I haven't tried this out yet, but it makes sense. One question: the line `model.add(layers.Lambda(dropout))` doesn't pass an `input` parameter for `dropout(input)`; I'm assuming this is inherited from the previous layer? – coolhand Jun 22 '19 at 23:00
  • Also, would you be willing to explain why you use `K.set_learning_phase(0)`, build the model, and then reset the weights? I'm assuming this is because you don't want to use dropout on the test data? – coolhand Jun 22 '19 at 23:09
  • Yes, during training/prediction the output from the previous Dense layer will be passed to the Lambda layer. Learning phase is set to 1 to indicate training and set to 0 for test. That is the Keras convention. – Manoj Mohan Jun 23 '19 at 03:07
0

How would I 1) actually integrate that into my model

Actually, that piece of Python code which uses numpy library is only for illustration of how the dropout works. It's not the way you should implement Dropout in a Keras model. Rather, to use Dropout in a Keras model you need to use the Dropout layer and give it a ratio number (between zero and one) which denotes the dropout rate:

from keras import layers

# ...
model.add(layers.Dropout(dropout_rate))
# add the rest of layers to the model ...

2) how would I remove the dropout at test time?

You don't need to do anything manually. It's handled by Keras automatically and would be turned off in prediction phase when you use predict() method.

today
  • 32,602
  • 8
  • 95
  • 115
  • Sorry, it looks like I added my edit about the same time you posted. I'm aware of the `layers.Dropout()` method, I'm looking for a way to adjust the features manually – coolhand Jun 21 '19 at 17:15
  • @coolhand Oh, so you want to implement the dropout manually, right? – today Jun 21 '19 at 17:16
  • @coolhand It's relatively easy to do so. You just need to create a zeros and ones mask (where the probability of each element in the mask to be zero is equal to `dropout_rate`) and then multiply that mask with input. You can either choose to scale-up the numbers in training time (by dividing the output of remaining neurons by `dropout_rate`) or instead multiply the output in prediction phase by `dropout_rate`. See the [definition](https://github.com/keras-team/keras/blob/c658993cf596fbd39cf800873bc457e69cfb0cdb/keras/layers/core.py#L81) of `Dropout` layer in Keras... – today Jun 21 '19 at 18:01
  • @coolhand as well as the definition of [`tf.nn.dropout`](https://github.com/tensorflow/tensorflow/blob/93dd14dce2e8751bcaab0a0eb363d55eb0cc5813/tensorflow/python/ops/nn_ops.py#L2983) method (it uses the first method: scaling-up at training time) which is used internally by Keras when backend is set to TF. – today Jun 21 '19 at 18:02
  • @coolhand For generation of random mask you can use `keras.backend.random_uniform()` method (which produces a random tensor from a uniform distribution and by default each element of it is between zero and one). Then set to zero every element in that random tensor which is smaller than `dropout_rate`, e.g. like this: `mask = K.cast(mask > dropout_rate, K.floatx())` where `K` is actually `keras.backend`.... oh, why haven't I put all these into an answer? Never mind! – today Jun 21 '19 at 18:06
  • from the linked documentation, it looks like I add a layer of `model.add(Masking(mask_value=0., input_shape=(timesteps, features)))` I'm not exactly following how this would be implemented. Say (for the sake of the argument) I want to mask every other value with 1.0 instead of zero, I'm not following how this would work – coolhand Jun 21 '19 at 20:46
  • @coolhand Isn't that a totally different question? We are not supposed to that (i.e. asking different question in a single question page) here. But anyways, [this answer](https://stackoverflow.com/a/53470422/2099607) explains how masking works in Keras. – today Jun 22 '19 at 03:01