Is tensorflow embedding_lookup differentiable?

Question

Some of the tutorials I came across, described using a randomly initialized embedding matrix and then using the tf.nn.embedding_lookup function to obtain the embeddings for the integer sequences. I am under the impression that since the embedding_matrix is obtained through tf.get_variable, the optimizer would add appropriate ops for updating it.

What I don't understand is how backpropagation happens through the lookup function which seems to be hard rather than being soft. What is the gradient of the this operation wrt. one of it's input ids?

score 8 · Accepted Answer · answered Jan 09 '18 at 11:25

8

Embedding matrix lookup is mathematically equivalent to dot product with the one-hot encoded matrix (see this question), which is a smooth linear operation.

For example, here's a lookup at the index 3:

Here's the formula for the gradient:

... where left-hand side is the derivative of negative log-likelihood (i.e., the objective function), x are the input words, W is the embedding matrix and delta is the error signal.

tf.nn.embedding_lookup is optimized so that no one-hot encoding conversion happens, but the backprop is working according to the same formula.

answered Jan 09 '18 at 11:25

Maxim

52,561
27
155
209

1

Thanks for the explanation! Cheers! – Animesh Karnewar Jan 10 '18 at 06:05
1

@Maxim Could you please post the source of the gradient calculation equation which you posted here ? – Pranjal Sahu Mar 27 '19 at 08:08
I like this explanation better. https://datascience.stackexchange.com/questions/33041/backpropgating-error-to-emedding-matrix – auro Aug 20 '20 at 18:25

Is tensorflow embedding_lookup differentiable?

1 Answers1