Doc2Vec model Python 3 compatibility

Question

I trained a doc2vec model with Python2 and I would like to use it in Python3.

When I try to load it in Python 3, I get :

Doc2Vec.load('my_doc2vec.pkl')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in range(128)

It seems to be related to a pickle compatibility issue, which I tried to solve by doing :

with open('my_doc2vec.pkl', 'rb') as inf:
    data = pickle.load(inf)
data.save('my_doc2vec_python3.pkl')

Gensim saved other files which I renamed as well so they can be found when calling

de = Doc2Vec.load('my_doc2vec_python3.pkl')

The load() does not fail with UnicodeDecodeError but after the inference provides meaningless results.

I can't easily re-train it using Gensim in Python 3 as I used this model to create derived data from it, so I would have to re-run a long and complex pipeline.

How can I make the doc2vec model compatible with Python 3?

score 2 · Answer 1 · edited May 23 '17 at 12:01

Answering my own question, this answer worked for me.

Here are the steps a bit more details :

download gensim source code, e.g clone from repo
in gensim/utils.py, edit the method unpickle to add the encoding parameter:
```
 return _pickle.loads(f.read(), encoding='latin1')
```
using Python 3 and the modified gensim, load the model:
```
de = Doc2Vec.load('my_doc2vec.pkl')
```
save it:
```
de.save('my_doc2vec_python3.pkl')
```

This model should be now loadable in Python 3 with unmodified gensim.

Doc2Vec model Python 3 compatibility

1 Answers1