Some of the tutorials I came across, described using a randomly initialized embedding matrix and then using the tf.nn.embedding_lookup
function to obtain the embeddings for the integer sequences. I am under the impression that since the embedding_matrix
is obtained through tf.get_variable
, the optimizer would add appropriate ops for updating it.
What I don't understand is how backpropagation happens through the lookup function which seems to be hard rather than being soft. What is the gradient of the this operation wrt. one of it's input ids?