Using the Word2Vec
implementation of the module gensim
in order to construct word embeddings for the sentences I do have in a plain text file. Despite the word happy
is defined in the vocabulary, getting the error KeyError: "word 'happy' not in vocabulary"
. Tried to apply the given the answers to a similar question, but did not work. Hence, posted my own question.
Here is the code:
try:
data = []
with open(TXT_PATH, 'r', encoding='utf-8') as txt_file:
for line in txt_file:
for part in line.split(' '):
data.append(part.strip())
# When I debug, both of the words 'happy' and 'birthday' exist in the variable 'data'
word2vec = Word2Vec(data, min_count=5, size=10000, window=5, workers=4)
# Print result
word_1 = 'happy'
word_2 = 'birthday'
print(f'Similarity between {word_1} and {word_2} thru word2vec: {word2vec.similarity(word_1, word_2)}')
except Exception as err:
print(f'An error happened! Detail: {str(err)}')