I have difficulty understanding how exactly masking works in Tensorflow/Keras. On the Keras website (https://www.tensorflow.org/guide/keras/masking_and_padding) they simply say that the neural network layers skip/ignore the masked values but it doesn't explain how? Does it force the weights to zero? (I know a boolean array is being created but I don't know how it's being used)
For example check this simple example:
tf.random.set_seed(1)
embedding = tf.keras.layers.Embedding(input_dim=10, output_dim=3, mask_zero=True)
masked_output = embedding(np.array([[1,2,0]]))
print(masked_output)
I asked the Embedding layer to mask zero inputs. Now look at the output:
tf.Tensor(
[[[ 0.00300496 -0.02925059 -0.01254098]
[ 0.04872786 0.01087702 -0.03656749]
[ 0.00446818 0.00290152 -0.02269397]]], shape=(1, 3, 3), dtype=float32)
If you change the "mask_zero" argument to False you get the exact same results. Does anyone know what's happening behind the scene? Any resources explaining the masking mechanism more thoroughly is highly appreciated.
P.S: This is also an example of a full Neural Network which gives an identical outcome with and without masking:
tf.random.set_seed(1)
input = np.array([[1,2,0]]) # <--- 0 should be masked and ignored
embedding = tf.keras.layers.Embedding(input_dim=10, output_dim=3, mask_zero=True)
masked_output = embedding(input)
flatten = tf.keras.layers.Flatten()(masked_output)
dense_middle = tf.keras.layers.Dense(4)(flatten)
out = tf.keras.layers.Dense(1)(dense_middle)
print(out)