4

In order to fine-tune word2vec embeddings in gensim, the following piece of code worked with previous versions:

model = Word2Vec.load_word2vec_format('GoogleNews-vectors- 
negative300.bin.gz', binary=True)

However, I get the error message that Word2Vec.load_word2vec is depracated : DeprecationWarning: Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead. When I use

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews- 
vectors-negative300.bin.gz', binary=True)

and then try to fine tune the model with train method as below:

model.train((corpus, total_examples=len(corpus2),epochs=10) )

I get the following error:

"AttributeError: 'Word2VecKeyedVectors' object has no attribute 'train'"

Is there still any solution to load the existing Googlenews W2V into gensim and fine-tune it with additional corpus?

In response to user:10473854: ignoring warning does not work as the module is already depracated. Also, running Word2Vec with the path for downloaded embedding will make Word2Vec fails. Check this:

model = Word2Vec('GoogleNews-vectorsnegative300.bin.gz')
model.wv.vocab

{'/': <gensim.models.keyedvectors.Vocab at 0x7ff6101c3940>,
'a': <gensim.models.keyedvectors.Vocab at 0x7ff6101c39e8>,
'e': <gensim.models.keyedvectors.Vocab at 0x7ff6101c3278>}
  • I don't believe it ever was directly possible (without a lot of other steps) to keep training a `Word2Vec` model created by `Word2Vec.load_word2vec_format()`. More commentary in my answer to this same question on the gensim discussion list: https://groups.google.com/d/msg/gensim/XFcVxPqMiOc/9ziDKJKZCAAJ – gojomo Apr 02 '20 at 00:25

1 Answers1

0

I wrote a similar thing for GloVe vectors in this answer

Basically starting from GloVe vectors and fine-tuning it on additional corpus using gensim.

In a similar fashion, it can be done for google news vectors as well.

In a gist, you need to set the hidden layers with the old vectors for words in your corpus which are already there in the google new corpus.

ashutosh singh
  • 511
  • 3
  • 15