I'm trying to modify this Tensorflow LSTM model to load this pre-trained GoogleNews word ebmedding GoogleNews-vectors-negative300.bin (or a tensorflow Word2Vec embedding would be just as good).
I've been reading examples on how to load a pre-trained word embedding into tensorflow (eg. 1: here, 2: here, 3: here and 4: here).
In the first linked example they can easily assign the embedding to the graph:
sess.run(cnn.W.assign(initW))
In the second linked example they create an embedding-wrapper variable:
with tf.variable_scope("embedding_rnn_seq2seq/rnn/embedding_wrapper", reuse=True):
em_in = tf.get_variable("embedding")
then they initialize the embedding wrapper:
sess.run(em_in.assign(initW))
Both those examples make sense, but it's not obvious to me how I can assign the unpacked embedding initW to the TF graph in my case. (I'm a TF beginner).
I can prepare initW like the first two examples:
def loadEmbedding(self, word_to_id):
# New model, we load the pre-trained word2vec data and initialize embeddings
with open(os.path.join('GoogleNews-vectors-negative300.bin'), "rb", 0) as f:
header = f.readline()
vocab_size, vector_size = map(int, header.split())
binary_len = np.dtype('float32').itemsize * vector_size
initW = np.random.uniform(-0.25,0.25,(len(word_to_id), vector_size))
for line in range(vocab_size):
word = []
while True:
ch = f.read(1)
if ch == b' ':
word = b''.join(word).decode('utf-8')
break
if ch != b'\n':
word.append(ch)
if word in word_to_id:
initW[word_to_id[word]] = np.fromstring(f.read(binary_len), dtype='float32')
else:
f.read(binary_len)
return initW
From the solution in example 4, I thought I should be able to do something like
session.run(tf.assign(embedding, initW)).
If I try to add the line here like this when the session is initialized :
with sv.managed_session() as session:
initializer = tf.random_uniform_initializer(-config.init_scale,
config.init_scale)
session.run(tf.assign(m.embedding, initW))
I get the following error:
ValueError: Fetch argument <tf.Tensor 'Assign:0' shape=(10000, 300) dtype=float32_ref> cannot be interpreted as a Tensor. (Tensor Tensor("Assign:0", shape=(10000, 300), dtype=float32_ref, device=/device:CPU:0) is not an element of this graph.)
Update: I updated the code following Nilesh Birari's suggestion: Full code. It results in no improvement in validation or test set perplexity, it only improves training set perplexity.