Is there still any solution to load the existing Googlenews W2v into gensim and finetune it with additional corpus?

Question

In order to fine-tune word2vec embeddings in gensim, the following piece of code worked with previous versions:

model = Word2Vec.load_word2vec_format('GoogleNews-vectors- 
negative300.bin.gz', binary=True)

However, I get the error message that Word2Vec.load_word2vec is depracated : DeprecationWarning: Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead. When I use

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews- 
vectors-negative300.bin.gz', binary=True)

and then try to fine tune the model with train method as below:

model.train((corpus, total_examples=len(corpus2),epochs=10) )

I get the following error:

"AttributeError: 'Word2VecKeyedVectors' object has no attribute 'train'"

Is there still any solution to load the existing Googlenews W2V into gensim and fine-tune it with additional corpus?

In response to user:10473854: ignoring warning does not work as the module is already depracated. Also, running Word2Vec with the path for downloaded embedding will make Word2Vec fails. Check this:

model = Word2Vec('GoogleNews-vectorsnegative300.bin.gz')
model.wv.vocab

{'/': <gensim.models.keyedvectors.Vocab at 0x7ff6101c3940>,
'a': <gensim.models.keyedvectors.Vocab at 0x7ff6101c39e8>,
'e': <gensim.models.keyedvectors.Vocab at 0x7ff6101c3278>}

I don't believe it ever was directly possible (without a lot of other steps) to keep training a `Word2Vec` model created by `Word2Vec.load_word2vec_format()`. More commentary in my answer to this same question on the gensim discussion list: https://groups.google.com/d/msg/gensim/XFcVxPqMiOc/9ziDKJKZCAAJ — gojomo, Apr 02 '20 at 00:25

score 0 · Answer 1 · answered Apr 08 '20 at 20:54

I wrote a similar thing for GloVe vectors in this answer

Basically starting from GloVe vectors and fine-tuning it on additional corpus using gensim.

In a similar fashion, it can be done for google news vectors as well.

In a gist, you need to set the hidden layers with the old vectors for words in your corpus which are already there in the google new corpus.

Is there still any solution to load the existing Googlenews W2v into gensim and finetune it with additional corpus?

1 Answers1