1

I am trying to load the pre-trained word2Vec model using the command below but get an Unicode error. Need some help getting to the bottom of it. I googled around but could not find a working solution to this.

python -m spacy init-model en /tmp/google_news_vectors --vectors-loc ~/Downloads/GoogleNews-vectors-negative300.bin.gz


UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 7: invalid start byte
Ram K
  • 1,746
  • 2
  • 14
  • 23
  • Possible duplicate of [UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte](https://stackoverflow.com/questions/22216076/unicodedecodeerror-utf8-codec-cant-decode-byte-0xa5-in-position-0-invalid-s) – Wiktor Stribiżew Aug 26 '19 at 07:00

1 Answers1

3

Spacy expects the vectors to be in the text format rather than the binary format:

https://spacy.io/api/cli#init-model

For how to convert the binary model, see: https://stackoverflow.com/a/33183634/461847

aab
  • 10,858
  • 22
  • 38