Dropout entire input layer

Question

Suppose I have two inputs (each with a number of features), that I want to feed into a Dropout layer. I want each iteration to drop out a whole input, with all of its associated features, and keep the whole of the other input.

After concatenating the inputs, I think I need to use the noise_shape parameter for Dropout, but the shape of the concatenated layer doesn't really let me do that. For two inputs of shape (15,), the concatenated shape is (None, 30), rather than (None, 15, 2), so one of the axes is lost and I can't drop out along it.

Any suggestions for what I could do? Thanks.

from keras.layers import Input, concatenate, Dense, Dropout

x = Input((15,))  # 15 features for the 1st input
y = Input((15,))  # 15 features for the 2nd input
xy = concatenate([x, y])
print(xy._keras_shape)
# (None, 30)

layer = Dropout(rate=0.5, noise_shape=[xy.shape[0], 1])(xy)
...

One way that I've implemented this is inside a generator. You can build a generator that returns the two inputs, and inside it it will randomly return either one of them as 0 with probability 0.5 — Mohamad Zeina, Sep 11 '19 at 13:20

Toukenize · Accepted Answer · 2019-09-11T09:05:51.613

EDIT :

Seems like I misunderstood your question, here is the updated answer based on your requirement.

To achieve what you want, x and y effectively become the timesteps, and according to Keras documentation, noise_shape=(batch_size, 1, features) if your input shape is (batch_size, timesteps, features):

x = Input((15,1))  # 15 features for the 1st input
y = Input((15,1))  # 15 features for the 2nd input
xy = concatenate([x, y])

dropout_layer = Dropout(rate=0.5, noise_shape=[None, 1, 2])(xy)
...

To test that you are getting the correct behavior, you can inspect the intermediate xy layer and dropout_layer using the following code (reference link):

### Define your model ###

from keras.layers import Input, concatenate, Dropout
from keras.models import Model
from keras import backend as K

# Learning phase must be set to 1 for dropout to work
K.set_learning_phase(1)

x = Input((15,1))  # 15 features for the 1st input
y = Input((15,1))  # 15 features for the 2nd input
xy = concatenate([x, y])

dropout_layer = Dropout(rate=0.5, noise_shape=[None, 1, 2])(xy)

model = Model(inputs=[x,y], output=dropout_layer)

# specify inputs and output of the model

x_inp = model.input[0]                                           
y_inp = model.input[1]
outp = [layer.output for layer in model.layers[2:]]        
functor = K.function([x_inp, y_inp], outp)

### Get some random inputs ###

import numpy as np

input_1 = np.random.random((1,15,1))
input_2 = np.random.random((1,15,1))

layer_outs = functor([input_1,input_2])
print('Intermediate xy layer:\n\n',layer_outs[0])
print('Dropout layer:\n\n', layer_outs[1])

You should see that the entire x or y are dropped randomly (50% chance) per your requirement:

Intermediate xy layer:

 [[[0.32093528 0.70682645]
  [0.46162075 0.74063486]
  [0.522718   0.22318116]
  [0.7897043  0.7849486 ]
  [0.49387926 0.13929296]
  [0.5754296  0.6273373 ]
  [0.17157765 0.92996144]
  [0.36210892 0.02305864]
  [0.52637625 0.88259524]
  [0.3184462  0.00197006]
  [0.67196816 0.40147918]
  [0.24782693 0.5766827 ]
  [0.25653633 0.00514544]
  [0.8130438  0.2764429 ]
  [0.25275478 0.44348967]]]

Dropout layer:

 [[[0.         1.4136529 ]
  [0.         1.4812697 ]
  [0.         0.44636232]
  [0.         1.5698972 ]
  [0.         0.2785859 ]
  [0.         1.2546746 ]
  [0.         1.8599229 ]
  [0.         0.04611728]
  [0.         1.7651905 ]
  [0.         0.00394012]
  [0.         0.80295837]
  [0.         1.1533654 ]
  [0.         0.01029088]
  [0.         0.5528858 ]
  [0.         0.88697934]]]

If you are wondering why all the elements are multiplied by 2, take a look at how tensorflow implemented dropout here.

Hope this helps.

Thanks for the suggestion. Regarding the 2nd point, doesn't the dropout make features of `x` to be dropped with `rate=0.5`, rather than the whole of `x`? — foxpal, Sep 11 '19 at 03:47
It selects 50% of the whole of `x` to be dropped out randomly. Accrording to the docmentation - `Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.` and `rate: float between 0 and 1. Fraction of the input units to drop.` — Toukenize, Sep 11 '19 at 03:50
Your explanation and the documentation sound to me like individual features of `x` get dropped with probability `rate`. What I want is to either keep or drop `x` with probability `rate`. I wish there was an easy way to test what gets dropped. — foxpal, Sep 11 '19 at 07:21
@foxpal Sorry, I wasn't aware that you wanted to either keep or drop `x` and `y` entirely with probability `rate`. Updated my answer to suit your application and included a code snippet on how to inspect the output. cheers. — Toukenize, Sep 11 '19 at 09:07
That's great, thanks! The 2nd part basically answers my other question as well: https://stackoverflow.com/q/57882172. Just one follow-up question: sometimes neither `x` nor `y` get dropped, but `xy` values still get doubled. Isn't this _wrong_? — foxpal, Sep 12 '19 at 01:37
Glad it helped. On the doubling of `xy` when neither gets dropped, I wouldn't say it is wrong, but rather, a limitation of how dropout was implemented in Tensorflow. The intent of doubling them when `p = 0.5` is to approximate the original output (without dropout), which is achieved over most of the updates (probabilistically speaking, especially when there are more timesteps). — Toukenize, Sep 12 '19 at 05:38

Dropout entire input layer

1 Answers1

Linked