I am loading pre-trained vectors from a binary file generated from the word2vec C code with something like:
model_1 = Word2Vec.load_word2vec_format('vectors.bin', binary=True)
I am using those vectors to generate vector representations of sentences that contain words that may not have already existing vectors in vectors.bin
. For example, if vectors.bin
has no associated vector for the word "yogurt", and I try
yogurt_vector = model_1['yogurt']
I get KeyError: 'yogurt'
, which makes good sense. What I want is to be able to take the sentence words that do not have corresponding vectors and add representations for them to model_1
. I am aware from this post that you cannot continue to train the C vectors. Is there then a way to train a new model, say model_2
, for the words with no vectors and merge model_2
with model_1
?
Alternatively, is there a way to test if the model contains a word before I actually try to retrieve it, so that I can at least avoid the KeyError?