I am trying to load the FastText and save that as a model so that I can deploy that on production as the file size is 1.2 gb and wont be a good practice to use that on Prod. Can anyone suggest an approach to save and load the model for production ("fasttext-wiki-news-subwords-300") Loading the file using gensim.downloader api
2 Answers
You can use the library https://github.com/avidale/compress-fasttext, which is a wrapper around Gensim that can serve compressed versions of unsupervised FastText models. The compressed versions can be orders of magnitude smaller (e.g. 20mb), with a tolerable loss in quality.

- 10,958
- 44
- 73
-
1Great library! If you're not already doing something similar, you might want to consider offering as an optional step the sort of postprocessing (mostly recentering) described in this 'All-But-The-Top' paper – https://arxiv.org/abs/1702.01417 – which might synergize well with compression (maybe improving it, maybe offsetting the occasional compression-related dacy in evaluations). – gojomo Mar 17 '22 at 19:49
In order to have clarity over exactly what you're getting, in what format, I strongly recommend downloading things like sets of pretrained vectors from their original sources rather than the Gensim gensim.downloader
convenience methods. (That API also, against most users' expectations & best packaging hygeine, will download & run arbitrary other code that's not part of Gensim's version-controlled source repository or its official PyPI package. See project issue #2283.)
For example, you could grab the raw vectors files direct from: https://fasttext.cc/docs/en/english-vectors.html
The tool from ~david-dale's answer looks interesting, for its radical compression, and if you can verify the compressed versions still work well for your purposes, it may be an ideal approach for memory-limited production deployments.
I would also consider:
- A production machine with enough GB of RAM to load the full model may not be too costly, and with these sorts of vector-models, typical access patterns mean you essentially always want the full model in RAM, with no virtual-memory swapping at all. If your deployment is in a web server, there are some memory-mapping tricks possible that can help many processes share the same singly-loaded copy of the model (to avoid time- and memory-consumptive redundant reloads). See this answer for an approach that works with
Word2Vec
(though that may need some adaptation forFastText
& recent Gensim versions). - If you don't need the Fasttext-specific subword-based synthesis, you can save the full-word vectors to a file in a simple format, then choose to only reload any small subset of the leading vectors (most common words) using the
limit
option ofload_word2vec_format()
. For exmaple:
# save only the word-vectors from a FastText model
ft_model.wv.save_word2vec_format('wvonly.txt', binary=False)
# ... then, later/elsewhere:
# load only 1st 50,000 word-vectors
wordvecs = KeyedVectors.load_word2vec_format('wvonly.txt', binary=False, limit=50000)

- 52,260
- 14
- 86
- 115