3

I trained a doc2vec model with Python2 and I would like to use it in Python3.

When I try to load it in Python 3, I get :

Doc2Vec.load('my_doc2vec.pkl')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in range(128)

It seems to be related to a pickle compatibility issue, which I tried to solve by doing :

with open('my_doc2vec.pkl', 'rb') as inf:
    data = pickle.load(inf)
data.save('my_doc2vec_python3.pkl')

Gensim saved other files which I renamed as well so they can be found when calling

de = Doc2Vec.load('my_doc2vec_python3.pkl')

The load() does not fail with UnicodeDecodeError but after the inference provides meaningless results.

I can't easily re-train it using Gensim in Python 3 as I used this model to create derived data from it, so I would have to re-run a long and complex pipeline.

How can I make the doc2vec model compatible with Python 3?

Bernard
  • 301
  • 2
  • 6

1 Answers1

2

Answering my own question, this answer worked for me.

Here are the steps a bit more details :

  1. download gensim source code, e.g clone from repo
  2. in gensim/utils.py, edit the method unpickle to add the encoding parameter:

     return _pickle.loads(f.read(), encoding='latin1')
    
  3. using Python 3 and the modified gensim, load the model:

    de = Doc2Vec.load('my_doc2vec.pkl')
    
  4. save it:

    de.save('my_doc2vec_python3.pkl')
    

This model should be now loadable in Python 3 with unmodified gensim.

Community
  • 1
  • 1
Bernard
  • 301
  • 2
  • 6