I'm looking at the official Tensorflow example for Word2Vec. They created a dictionary for all the words, and then created a reverse dictionary, and the reverse dictionary was mainly used in the rest of the code.
The line in question:
reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
Full code block
vocabulary_size = 50000
def build_dataset(words):
count = [['UNK', -1]]
count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
dictionary = dict()
for word, _ in count:
dictionary[word] = len(dictionary)
data = list()
unk_count = 0
for word in words:
if word in dictionary:
index = dictionary[word]
else:
index = 0 # dictionary['UNK']
unk_count = unk_count + 1
data.append(index)
count[0][1] = unk_count
reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
return data, count, dictionary, reverse_dictionary
data, count, dictionary, reverse_dictionary = build_dataset(words)
Full official implementation.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb
This the official implementation from Tensorflow, so there must be a good reason why they did this