I would like to replace a random word from a sentence by the most similar word from word2vec, for example a word from the sentence question = 'Can I specify which GPU to use?'
.
I used this recursive method because with the split function, some words (like to
) are not in word2vecmodel:
import gensim.models.keyedvectors as word2vec
import random as rd
model = word2vec.KeyedVectors.load_word2vec_format('/Users/nbeau/Desktop/Word2vec/model/GoogleNews-vectors-negative300.bin', binary=True)
def similar_word(sentence, size):
pos_to_replace = rd.randint(0, size-1)
try:
similarity = model.most_similar(positive = [sentence[pos_to_replace]])
similarity = similarity[0][0]
except KeyError:
similarity, pos_to_replace = test(sentence, size)
return similarity, pos_to_replace
return similarity, pos_to_replace
question = question.split()
size = len(question)
similarity, pos_to_replace = similar_word(question, size)
sentence[pos_to_replace] = similarity
I would like to know if there is a better method to avoid the words which are not in the word2vec model.