2

I am currently going through Google's TensorFlow cookbook:

This is a TensorFlow implementation of the skip-gram model.

On line 272, the author decides to negatively multiply the similarity matrix (-sim[j, :]). I am a little bit confused why do we need to negatively multiply the similarity matrix in a skip-gram model. Any ideas?

for j in range(len(valid_words)):
        valid_word = word_dictionary_rev[valid_examples[j]]
        top_k = 5 # number of nearest neighbors
        **nearest = (-sim[j, :]).argsort()[1:top_k+1]**
        log_str = "Nearest to {}:".format(valid_word)
        for k in range(top_k):
            close_word = word_dictionary_rev[nearest[k]]
            score = sim[j,nearest[k]]
            log_str = "%s %s," % (log_str, close_word)
        print(log_str)
Maxim
  • 52,561
  • 27
  • 155
  • 209
Yi Xiang Chong
  • 744
  • 11
  • 9

1 Answers1

3

Let's go through this example step by step:

  • First, there's a similarity tensor. It is defined as a matrix of pairwise cosine similarities between embedding vectors:

    # Cosine similarity between words
    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
    normalized_embeddings = embeddings / norm
    valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings,valid_dataset)
    similarity= tf.matmul(valid_embeddings,normalized_embeddings,transpose_b=True)
    

    The matrix is computed for all validations words and all dictionary words, and contains numbers between [-1,1]. In this example, the vocab size is 10000 and the validation set consists of 5 words, so the shape of the similarity matrix is (5, 10000).

  • This matrix is evaluated to a numpy array sim:

    sim = sess.run(similarity, feed_dict=feed_dict)
    

    Consequently, sim.shape = (5, 10000) as well.

  • Next, this line:

    nearest = (-sim[j, :]).argsort()[1:top_k+1]
    

    ... computes the top_k nearest word indices to the current word j. Take a look at numpy.argsort method. The negation is just a numpy way of sorting in descending order. If there were no minus, the result would be the top_k furthest words from the dictionary, which won't indicate word2vec has learned anything.

    Also note that the range is [1:top_k+1], not [:top_k], because the 0-th word is the current validation word itself. There's no point in printing that the closest word to "love" is... "love".

    The result of this line would be an array like [ 73 1684 850 1912 326], which corresponds to words sex, fine, youd, trying, execution.

Maxim
  • 52,561
  • 27
  • 155
  • 209