0

I am trying to reproduce the results from this repository: https://github.com/danielricks/scholar. I do not have Linux and so cannot install the word2vec package the code uses, but it's only used for loading a pretrained word2vec model anyway, so Gensim should do the job.

The problem is that the pretrained model used by scholar is stored in a pickle file (provided in the Readme under "processed files"), postagged_wikipedia_for_word2vec_30kn3kv.pkl.

When I try to open this file I got ModuleNotFoundError No module named 'word2vec'. I went inside the pickle file (in Notepad) and changed word2vec near the beginning to gensim.models.word2vec, but then I got ModuleNotFoundError No module named 'gensim.models.word2vec'

I use Windows and so word2vec is not really feasible to install. That is why I am trying to come up with a way to use Gensim here.

Mobeus Zoom
  • 598
  • 5
  • 19

2 Answers2

0

try importing gensim in the script in which you are trying to load the pickle object.

When using pickle, it requires you to import dependant modules in the script which loads the pickle, to avoid having to pickle and store the whole module.

FinleyGibson
  • 911
  • 5
  • 18
0

If you need to load this module's pickle file, based on class objects from whatever word2vec package it was using, you'll need to have its same word2vec code available. (The fact that Gensim also offers some similar functionality doesn't mean its code is sufficient to unpickle objects based on other code.)

You could try one or more of the following:

  • using the larger .bin file included with this project, which is likely the data formal that Gensim's KeyedVectors class can load with load_word2vec_format(filename, binary=True). (What they considered too large to be practical, when this code was created ~5+ years ago, might be no problem on your current machine. You'd likely need to update other lines of code that were using the old word2vec code methods as well.)

  • switching to Linux - many of the libraries involved in Python text processing & machine-learning are primarily developed, tested, and used on Linux, so you generally have far fewer installation/dependency/configuration problems there

  • figure how to install the word2vec.c executable on Windows – where the other answer hits a 'bin\word2vec': doesn't exist error, it's because that native-compiled executable isn't present, and it seems this project's Python word2vec module depends on that. Someone somewhere may have figured out how to do that compilation. (The asker you've linked doesn't look like they tried that separate manual step.)

gojomo
  • 52,260
  • 14
  • 86
  • 115