3

I am trying to implement an embedding layer. The embedding is going to be initialized using pre-trained glove embedding. For words that can be found in glove, it will be fixed. For those words that don't appear in glove, it will be initialized randomly, and will be trainable. How do I do it in tensorflow? I am aware that there is a tf.stop_gradient for a whole tensor, is there any kind of stop_gradient api for this kind of scenario? or, is there any workaround for this? any suggestion is appreciated

Jerrik Eph
  • 368
  • 2
  • 12

3 Answers3

17

So the idea is to use mask and tf.stop_gradient to crack this problem:

res_matrix = tf.stop_gradient(mask_h*E) + mask*E,

where in matrix mask, 1 denotes to which entry I would like to apply gradient, 0 denotes to which entry I don't want to apply gradient(set gradient to 0), mask_h is the invese of mask (1 flip to 0, 0 flip to 1) .Then we can fetch from the res_matrix . here is the testing code:

import tensorflow as tf
import numpy as np

def entry_stop_gradients(target, mask):
    mask_h = tf.abs(mask-1)
    return tf.stop_gradient(mask_h * target) + mask * target

mask = np.array([1., 0, 1, 1, 0, 0, 1, 1, 0, 1])
mask_h = np.abs(mask-1)

emb = tf.constant(np.ones([10, 5]))

matrix = entry_stop_gradients(emb, tf.expand_dims(mask,1))

parm = np.random.randn(5, 1)
t_parm = tf.constant(parm)

loss = tf.reduce_sum(tf.matmul(matrix, t_parm))
grad1 = tf.gradients(loss, emb)
grad2 = tf.gradients(loss, matrix)
print matrix
with tf.Session() as sess:
    print sess.run(loss)
    print sess.run([grad1, grad2])
Jerrik Eph
  • 368
  • 2
  • 12
  • You might want to update it because some functions has been updated. See https://github.com/tensorflow/tensorflow/issues/9162 for a usable code snippet. – Alan Jan 26 '19 at 07:14
1

I would suggest that you have two diferent tensor for holding your data : a tf.constant for your pre-trained data, and a tf.Variable for your new data to be trained, then you can mix both with a concatenation and likewise tensor joining operations.

Since the tf.constant can't be trained, you will not have to worry about stoping the gradient.

  • That way I will have to do much preprocessing. That will make my code looks kinda ugly. I will try using gather and scatter along with stop_gradient, see if that will work. Really wish for a feature to support this. thanks though. – Jerrik Eph Apr 12 '17 at 09:49
1

I don't know much about word embeddings, but my understanding of your question is that you have a variable v and you want to train only certain (known) entries of it. You can achieve that by manipulating the gradients using a "mask", i.e. a constant tensor of the same shape as v that has value 1 for the trainable entries and 0 otherwise.

v = your_variable()
loss = your_loss() #some loss that uses v
mask = your_mask_as_explained_above()
opt = tf.train.GradientDescentOptimizer(learning_rate=0.1)

# Get list (length 1 in our example) of (gradient, variable)-pairs from the optimizer and extract the gradient w.r.t. v
grads_and_vars = opt.compute_gradients(loss, [v])
v_grad = grads_and_vars[0][0]

# Multiply the gradient with the mask before feeding it back to the optimizer
sgd_step = opt.apply_gradients([(v, v_grad*mask)])

Depending on the complexity of your problem, this might not be an efficient solution, though, since the full gradient w.r.t. v is computed in each step. It is simply not applied in the optimizer update.

If you are not familiar with opt.compute_gradients and opt.apply_gradients, theres an explanation in the docs.

lballes
  • 1,502
  • 12
  • 19
  • thanks for your reply, I think your solution will work. and I have just come up with another idea, I have posted it below. – Jerrik Eph Apr 12 '17 at 11:39