3

I would like to extract and store the dropout mask [array of 1/0s] from a dropout layer in a Sequential Keras model at each batch while training. I was wondering if there was a straight forward way way to do this within Keras or if I would need to switch over to tensorflow (How to get the dropout mask in Tensorflow).

Would appreciate any help! I'm quite new to TensorFlow and Keras.

There are a couple of functions (dropout_layer.get_output_mask(), dropout_layer.get_input_mask()) for the dropout layer that I tried using but got None after calling on the previous layer.

model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="flat", input_shape=(28, 28, 1)))
model.add(tf.keras.layers.Dense(
    512,
    activation='relu',
    name = 'dense_1',
    kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
    bias_initializer='zeros'))
dropout = tf.keras.layers.Dropout(0.2, name = 'dropout') #want this layer's mask

model.add(dropout)
x = dropout.output_mask
y = dropout.input_mask
model.add(tf.keras.layers.Dense(
    10,
    activation='softmax',
    name='dense_2',
    kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
    bias_initializer='zeros'))

model.compile(...)
model.fit(...)
holighost
  • 31
  • 2

2 Answers2

0

It's not easily exposed in Keras. It goes deep until it calls the Tensorflow dropout.

So, although you're using Keras, it's will also be a tensor in the graph that can be gotten by name (finding it's name: In Tensorflow, get the names of all the Tensors in a graph).

This option, of course will lack some keras information, you should probably have to do that inside a Lambda layer so Keras adds certain information to the tensor. And you must take extra care because the tensor will exist even when not training (where the mask is skipped)

Now, you can also use a less hacky way, that may consume a little processing:

def getMask(x):
    boolMask = tf.not_equal(x, 0)
    floatMask = tf.cast(boolMask, tf.float32) #or tf.float64
    return floatMask

Use a Lambda(getMasc)(output_of_dropout_layer)

But instead of using a Sequential model, you will need a functional API Model.

inputs = tf.keras.layers.Input((28, 28, 1))
outputs = tf.keras.layers.Flatten(name="flat")(inputs)
outputs = tf.keras.layers.Dense(
    512,
    #    activation='relu', #relu will be a problem here
    name = 'dense_1',
    kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
    bias_initializer='zeros')(outputs)

outputs = tf.keras.layers.Dropout(0.2, name = 'dropout')(outputs)
mask = Lambda(getMask)(outputs)
#there isn't "input_mask"


#add the missing relu: 
outputs = tf.keras.layers.Activation('relu')(outputs)
outputs = tf.keras.layers.Dense(
    10,
    activation='softmax',
    name='dense_2',
    kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123),
    bias_initializer='zeros')(outputs)

model = Model(inputs, outputs)
model.compile(...)
model.fit(...)

Training and predicting

Since you can't train the masks (it doesn't make any sense), it should not be an output of the model for training.

Now, we could try this:

trainingModel = Model(inputs, outputs)    
predictingModel = Model(inputs, [output, mask])    

But masks don't exist in prediction, because dropout is only applied in training. So this doesn't bring us anything good in the end.

The only way for training is then using a dummy loss and dummy targets:

def dummyLoss(y_true, y_pred):
    return y_true #but this might evoke a "None" gradient problem since it's not trainable, there is no connection to any weights, etc.    

model.compile(loss=[loss_for_main_output, dummyLoss], ....)

model.fit(x_train, [y_train, np.zeros((len(y_Train),) + mask_shape), ...)

It's not guaranteed that these will work.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • A more/less accurate approach is to pass the input of dropout layer to the `Lambda` layer and condition it on its non-zeros elements. Otherwise, it's not necessarily the case that if the output of dropout is zero then that neuron has been dropped (i.e. it might have been zero itself). Even this is not %100 accurate (i.e. if both input and output are zero, then you don't know if the corresponding neuron has been dropped or not). – today Sep 21 '19 at 07:29
  • Interesting... :) - Thank you for the tip. I'll probably update the answer in a while, but the probability of something going to an exact zero without the system forcing is waaay low. – Daniel Möller Sep 21 '19 at 16:31
  • Thank you so much! Question: Do you have to do anything special to get the `mask` from `Lambda(getMask)(outputs)? I assumed that the Lambda layer would appear in the model summary ( it didn't) and I could use model.layer.output[0] to get it? Would I need to make a callback to extract the mask variable? Or have I missed something obvious? (I'm using tensorflow 2.0) Thanks again for the help! – holighost Sep 22 '19 at 03:21
  • Well, you didn't use `mask` for anything, so it's not in the path of your model's outputs. That's why it doesn't appear. You can input it to another layer and use it normally as with any other layer output. If you want it to be one of your model's outputs, just make it a model's output: `model = Model(inputs, [outputs, mask])`. – Daniel Möller Sep 22 '19 at 03:40
  • I tried making it one of the model's outputs. But, I could no longer call `model.fit()` (Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected.) I didn't change the input at all, so I don't understand why this is happening, especially since the `lambda` layer is sharing the input with the rest of the model. I'm using `tf.data.Dataset `object for the inputs. Truthfully, all I wanted to do was saved the dropout mask as a numpy array at each batch during training (difficult for a noob). I really appreciate your help! – holighost Sep 22 '19 at 04:08
  • Well, since it's a model output, you are going to need data (target) for it. `y_train` must be then a list `[y_train_main_output, y_train_mask]`. So, either you create a separate model with mask only for prediction, or you pass dummy targets for this output, and define a dummy loss too. – Daniel Möller Sep 22 '19 at 06:34
  • By the way, thinking about many workarounds, I see there are a lot of problems in all approaches. So... what are you going to do with these masks? – Daniel Möller Sep 22 '19 at 06:38
  • Gotya. I'll try it out. Yeah, definitely surprised by how difficult it is to pull and save the dropout mask. I'm doing some analysis on the weights matrix and will use the mask as a feature of the nodes in different layers. Do you think I should ditch Keras and work directly with tensorflow? Not sure, if that would really help either though. – holighost Sep 22 '19 at 16:27
  • But what's the point of saving it if it will vary for every batch? It's a random tensor. It's easier to just pass it to the layers you want and use it. – Daniel Möller Sep 22 '19 at 16:52
  • I need to know if a given node was dropped out at each batch for the project I'm doing. Being dropped out represents being disconnected in my application and that is important. I understand that its just a random tensor though. So, I need to use the mask as normal but I also need to save it for inspection later. – holighost Sep 22 '19 at 17:36
  • With this explained, the approach I'd suggest is making your own training loop using eager execution. (Keras does not expose the results of training in any easy way) – Daniel Möller Sep 22 '19 at 18:08
  • Another fun adventure. Thank you for your help! – holighost Sep 22 '19 at 18:26
0

I found a very hacky way to do this by trivially extending the provided dropout layer. (Almost all code from TF.)

class MyDR(tf.keras.layers.Layer):
def __init__(self,rate,**kwargs):
    super(MyDR, self).__init__(**kwargs)

    self.noise_shape = None
    self.rate = rate


def _get_noise_shape(self,x, noise_shape=None):
    # If noise_shape is none return immediately.
    if noise_shape is None:
        return array_ops.shape(x)
    try:
        # Best effort to figure out the intended shape.
        # If not possible, let the op to handle it.
        # In eager mode exception will show up.
        noise_shape_ = tensor_shape.as_shape(noise_shape)
    except (TypeError, ValueError):
        return noise_shape

    if x.shape.dims is not None and len(x.shape.dims) == len(noise_shape_.dims):
        new_dims = []
        for i, dim in enumerate(x.shape.dims):
            if noise_shape_.dims[i].value is None and dim.value is not None:
                new_dims.append(dim.value)
            else:
                new_dims.append(noise_shape_.dims[i].value)
        return tensor_shape.TensorShape(new_dims)

    return noise_shape

def build(self, input_shape):
    self.noise_shape = input_shape
    print(self.noise_shape)
    super(MyDR,self).build(input_shape)

@tf.function
def call(self,input):
    self.noise_shape = self._get_noise_shape(input)
    random_tensor = tf.random.uniform(self.noise_shape, seed=1235, dtype=input.dtype)
    keep_prob = 1 - self.rate
    scale = 1 / keep_prob
    # NOTE: if (1.0 + rate) - 1 is equal to rate, then we want to consider that
    # float to be selected, hence we use a >= comparison.
    self.keep_mask = random_tensor >= self.rate
    #NOTE: here is where I save the binary masks. 
    #the file grows quite big!
    tf.print(self.keep_mask,output_stream="file://temp/droput_mask.txt")

    ret = input * scale * math_ops.cast(self.keep_mask, input.dtype)
    return ret
holighost
  • 31
  • 2