Unable to Load FastText model

Question

I am trying to load the FastText and save that as a model so that I can deploy that on production as the file size is 1.2 gb and wont be a good practice to use that on Prod. Can anyone suggest an approach to save and load the model for production ("fasttext-wiki-news-subwords-300") Loading the file using gensim.downloader api

score 1 · Answer 1 · answered Mar 17 '22 at 09:26

1

You can use the library https://github.com/avidale/compress-fasttext, which is a wrapper around Gensim that can serve compressed versions of unsupervised FastText models. The compressed versions can be orders of magnitude smaller (e.g. 20mb), with a tolerable loss in quality.

answered Mar 17 '22 at 09:26

David Dale

10,958
44
73

1

Great library! If you're not already doing something similar, you might want to consider offering as an optional step the sort of postprocessing (mostly recentering) described in this 'All-But-The-Top' paper – https://arxiv.org/abs/1702.01417 – which might synergize well with compression (maybe improving it, maybe offsetting the occasional compression-related dacy in evaluations). – gojomo Mar 17 '22 at 19:49

score 0 · Answer 2 · answered Mar 17 '22 at 18:17

In order to have clarity over exactly what you're getting, in what format, I strongly recommend downloading things like sets of pretrained vectors from their original sources rather than the Gensim gensim.downloader convenience methods. (That API also, against most users' expectations & best packaging hygeine, will download & run arbitrary other code that's not part of Gensim's version-controlled source repository or its official PyPI package. See project issue #2283.)

For example, you could grab the raw vectors files direct from: https://fasttext.cc/docs/en/english-vectors.html

The tool from ~david-dale's answer looks interesting, for its radical compression, and if you can verify the compressed versions still work well for your purposes, it may be an ideal approach for memory-limited production deployments.

I would also consider:

A production machine with enough GB of RAM to load the full model may not be too costly, and with these sorts of vector-models, typical access patterns mean you essentially always want the full model in RAM, with no virtual-memory swapping at all. If your deployment is in a web server, there are some memory-mapping tricks possible that can help many processes share the same singly-loaded copy of the model (to avoid time- and memory-consumptive redundant reloads). See this answer for an approach that works with Word2Vec (though that may need some adaptation for FastText & recent Gensim versions).
If you don't need the Fasttext-specific subword-based synthesis, you can save the full-word vectors to a file in a simple format, then choose to only reload any small subset of the leading vectors (most common words) using the limit option of load_word2vec_format(). For exmaple:

# save only the word-vectors from a FastText model
ft_model.wv.save_word2vec_format('wvonly.txt', binary=False)
# ... then, later/elsewhere:

# load only 1st 50,000 word-vectors
wordvecs = KeyedVectors.load_word2vec_format('wvonly.txt', binary=False, limit=50000)

Unable to Load FastText model

2 Answers2