TensorFlow cookbook skip-gram model with negative similarity

Question

I am currently going through Google's TensorFlow cookbook:

This is a TensorFlow implementation of the skip-gram model.

On line 272, the author decides to negatively multiply the similarity matrix (-sim[j, :]). I am a little bit confused why do we need to negatively multiply the similarity matrix in a skip-gram model. Any ideas?

for j in range(len(valid_words)):
        valid_word = word_dictionary_rev[valid_examples[j]]
        top_k = 5 # number of nearest neighbors
        **nearest = (-sim[j, :]).argsort()[1:top_k+1]**
        log_str = "Nearest to {}:".format(valid_word)
        for k in range(top_k):
            close_word = word_dictionary_rev[nearest[k]]
            score = sim[j,nearest[k]]
            log_str = "%s %s," % (log_str, close_word)
        print(log_str)

score 3 · Answer 1 · answered Feb 10 '18 at 09:03

Let's go through this example step by step:

First, there's a similarity tensor. It is defined as a matrix of pairwise cosine similarities between embedding vectors:
```
# Cosine similarity between words
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings,valid_dataset)
similarity= tf.matmul(valid_embeddings,normalized_embeddings,transpose_b=True)
```
The matrix is computed for all validations words and all dictionary words, and contains numbers between [-1,1]. In this example, the vocab size is 10000 and the validation set consists of 5 words, so the shape of the similarity matrix is (5, 10000).
This matrix is evaluated to a numpy array sim:
```
sim = sess.run(similarity, feed_dict=feed_dict)
```
Consequently, sim.shape = (5, 10000) as well.
Next, this line:
```
nearest = (-sim[j, :]).argsort()[1:top_k+1]
```
... computes the top_k nearest word indices to the current word j. Take a look at numpy.argsort method. The negation is just a numpy way of sorting in descending order. If there were no minus, the result would be the top_k furthest words from the dictionary, which won't indicate word2vec has learned anything.

Also note that the range is [1:top_k+1], not [:top_k], because the 0-th word is the current validation word itself. There's no point in printing that the closest word to "love" is... "love".

The result of this line would be an array like [ 73 1684 850 1912 326], which corresponds to words sex, fine, youd, trying, execution.

TensorFlow cookbook skip-gram model with negative similarity

1 Answers1