Question about input_dim in keras embedding layer

Question

From the documentation on tf.keras.layers.Embedding :

input_dim:

Integer. Size of the vocabulary, i.e. maximum integer index + 1.

mask_zero:

Boolean, whether or not the input value 0 is a special “padding” value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True, then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal size of vocabulary + 1).

I was reading this answer but I'm still confused. If my vocabulary size is n but they are encoded with index values from 1 to n (0 is left for padding), is input_dim equal to n or n+1?
If the inputs are padded with zeroes, what are the consequences of leaving mask_zero = False?
If mask_zero = True, based on the documentation, I would have to increment the answer from my first question by one? What is the expected behaviour if this was not done?

score 1 · Answer 1 · answered Jun 18 '21 at 08:51

I am basically just trying to rephrase parts of the linked answer to make it a bit more understandable in the current context, and also address your other subquestions (which technically should be their own questions, according to [ask]).

It does not matter whether you actually use 0 for padding or not, Keras assumes that you will start indexing from zero and will have to "brace itself" for an input value of 0 in your data. Therefore, you need to choose the value as n+1, because you are essentially just adding a specific value to your vocabulary that you previously didn't consider.
I think this is out of scope for this question to discuss in detail, but - depending on the exact model - the loss values on padded positions do not affect the backpropagation. However, if you choose mask_zero = False, your model will essentially have to correctly predict padding on all those positions (where the padding then also affects the training).
This relates to my illustration: Essentially, you are adding a new vocabulary index. If you do not adjust your dimension, there will likely be an indexing error (out of range) for the vocabulary entry with the highest index (n). Otherwise, you would likely not notice any different behavior.

If my vocabulary is indexed from `1` to `n` (0 as padding) and I set `mask_zero = True`, should input_dim be `n+1` and not `n+2`, since the size of my vocabulary is `n`? — Yandle, Jun 18 '21 at 22:04

Question about input_dim in keras embedding layer

1 Answers1