0

I have difficulty understanding how exactly masking works in Tensorflow/Keras. On the Keras website (https://www.tensorflow.org/guide/keras/masking_and_padding) they simply say that the neural network layers skip/ignore the masked values but it doesn't explain how? Does it force the weights to zero? (I know a boolean array is being created but I don't know how it's being used)

For example check this simple example:

tf.random.set_seed(1)

embedding = tf.keras.layers.Embedding(input_dim=10, output_dim=3, mask_zero=True)
masked_output = embedding(np.array([[1,2,0]]))
print(masked_output)

I asked the Embedding layer to mask zero inputs. Now look at the output:

tf.Tensor(
[[[ 0.00300496 -0.02925059 -0.01254098]
  [ 0.04872786  0.01087702 -0.03656749]
  [ 0.00446818  0.00290152 -0.02269397]]], shape=(1, 3, 3), dtype=float32)

If you change the "mask_zero" argument to False you get the exact same results. Does anyone know what's happening behind the scene? Any resources explaining the masking mechanism more thoroughly is highly appreciated.

P.S: This is also an example of a full Neural Network which gives an identical outcome with and without masking:

tf.random.set_seed(1)
input = np.array([[1,2,0]]) # <--- 0 should be masked and ignored
embedding = tf.keras.layers.Embedding(input_dim=10, output_dim=3, mask_zero=True)
masked_output = embedding(input)
flatten = tf.keras.layers.Flatten()(masked_output)
dense_middle = tf.keras.layers.Dense(4)(flatten)
out = tf.keras.layers.Dense(1)(dense_middle)
print(out)
Amin Shn
  • 532
  • 2
  • 11
  • Does this answer your question? [How does mask\_zero in Keras Embedding layer work?](https://stackoverflow.com/questions/47485216/how-does-mask-zero-in-keras-embedding-layer-work) – Franciska Feb 10 '23 at 12:59
  • @Franciska Not really, the answer mostly repeats Tensorflow's manual which is remotely clear. For example, what does "ignore" mean? In math there is no such term and we are doing math in NNs. Does "ignore" mean setting weights to zero? Also I gave an example here which shows that the mask doesn't affect the following layers at all (as opposed to the answer given in that link). – Amin Shn Feb 10 '23 at 13:16

1 Answers1

0

In TensorFlow/Keras, masking enables you to disregard certain parts of a tensor, typically those set to zero, when executing the forward pass of your neural network. This can be helpful when dealing with sequences of varying length, where padding is used to make all sequences the same length. In the forward pass, the covered-up elements are taken as having a value of 0, so that their effect on the output is ignored.

In the example you provided, the Embedding layer is set to mask zeros via the mask_zero argument, yet the outcome is the same regardless of whether mask_zero is set to True or False. This is because the example just has one input tensor with no zero values, thus there is no contrast in the output.

Underneath, TensorFlow implements masking by using a special tensor mask that is multiplied element-wise with the input tensor during the forward pass. This mask tensor has the same shape as the input tensor and comprises binary values that indicate if each element should be included or not.

example:

inputs = tf.keras.layers.Input(shape=(3,))
embedding = tf.keras.layers.Embedding(input_dim=10, output_dim=3, mask_zero=True)(inputs)
masking = tf.keras.layers.Masking()(embedding)
flatten = tf.keras.layers.Flatten()(masking)
dense_middle = tf.keras.layers.Dense(4)(flatten)
output = tf.keras.layers.Dense(1)(dense_middle)
model = tf.keras.Model(inputs, output)

By doing this, the network will be able to take advantage of the zeros when the "mask_zero" argument is set to False and will disregard them when it is True, resulting in different predictions.

  • The last element of the input is actually zero. Also I added another part you might want to check. – Amin Shn Feb 10 '23 at 12:16
  • It is true that the final element of the input in this example is zero, which gets disregarded when "mask_zero" is enabled. Consequently, the final element of the embedded input won't be taken into account when training or making predictions, and its gradients will not be calculated during the training procedure. Also, I checked out the other part that you included. – silentlyakitten Feb 10 '23 at 12:27
  • When "mask_zero" is set to false, the Embedding layer does not apply a mask to the input and accounts for every element in training and forecasting. The output is then pushed through the Flatten layer, which reduces the shape to (1, 9). This flattened output is then handled by two Dense layers, creating the final output of (1, 1). Regardless of whether the "mask_zero" is false, the zero element is still accounted for in the calculation of the output, so the result is the same. – silentlyakitten Feb 10 '23 at 12:28
  • So are you saying the mask is not broadcasted through the network? How can I make it broadcasted so that the network uses zero if the mask_zero is false and not using it when it is True and result in different predictions? – Amin Shn Feb 10 '23 at 12:30
  • The mask created by the Embedding layer with the "mask_zero" argument set to True is not automatically broadcasted to subsequent layers. To pass the mask information, you need to wrap the output of the Embedding layer with a Keras Masking layer – silentlyakitten Feb 10 '23 at 12:37
  • I would appreciate it if you could post the solution. It would benefit other users as well. – Amin Shn Feb 10 '23 at 12:39
  • Just used tf.random.set_seed(1) before your example code and tried print(model.predict(np.array([[1,2,0]]))) and the result is exactly the same with and without mask_zero = False. Please try it yourself. – Amin Shn Feb 10 '23 at 12:55
  • @AminShn did you wrap the output of the Embedding layer with a Masking layer like this?: https://pastebin.com/raw/G16FiBAg – silentlyakitten Feb 10 '23 at 13:03
  • I copy and pasted your posted code and run it exactly as you've written it. You should have included the tf.random.set_seed(1) before your code otherwise random weights will be generated each time resulting in different results regardless of the masking. – Amin Shn Feb 10 '23 at 13:09
  • I'm sorry for the oversight. Yes, if the random seed is not established prior to the code being executed, different weights will be generated for each run and the results could be different even if the mask_zero argument is set to the same value. Appreciate you pointing this out. – silentlyakitten Feb 10 '23 at 13:12