25

I recently downloaded fasttext pretrained model for english. I got two files:

  1. wiki.en.vec
  2. wiki.en.bin

I am not sure what is the difference between the two files?

Bhushan Pant
  • 1,445
  • 2
  • 13
  • 29

2 Answers2

26

The .vec files contain only the aggregated word vectors, in plain-text. The .bin files in addition contain the model parameters, and crucially, the vectors for all the n-grams.

So if you want to encode words you did not train with using those n-grams (FastText's famous "subword information"), you need to find an API that can handle FastText .bin files (most only support the .vec files, however...).

fnl
  • 4,861
  • 4
  • 27
  • 32
  • 1
    How do you actually work with the `.bin` file? I've tried `open(FILENAME, "rb")` but then not sure how to actually access the weight matrix after that – information_interchange May 06 '20 at 16:01
  • @information_interchange did you find the answer to this question? – Bob van Luijt May 25 '20 at 12:37
  • Please take a look at the official documentation for the Python API: https://github.com/facebookresearch/fastText/tree/master/python#saving-and-loading-a-model-object – fnl May 26 '20 at 05:42
  • @information_interchange you can do model = fasttext.load_model("embedding.bin") to load a model object. – dapperdan Aug 23 '20 at 23:15
16

As the documentation says,

model.vec is a text file containing the word vectors, one per line. model.bin is a binary file containing the parameters of the model along with the dictionary and all hyper parameters.

In other words, .vec file format is the same as .txt file format, and you could use it in other applications (for example, to exchange data between your FastText model and your Word2Vec model since .vec file is similar to .txt file generated by Word2Vec). And the .bin file could be used if you want to continue training the vectors or to restart the optimization.

Amir
  • 1,926
  • 3
  • 23
  • 40