Reduce fastText memory usage for big models

Question

I trained a machine learning sentence classification model that uses, among other features, also the vectors obtained from a pretrained fastText model (like these) which is 7Gb. I use the pretrained fastText Italian model: I am using this word embedding only to get some semantic features to feed into the effective ML model.

I built a simple API based on fastText that, at prediction time, computes the vectors needed by the effective ML model. Under the hood, this API receives a string as input and calls get_sentence_vector. When the API starts, it loads the fastText model into memory.

How can I reduce the memory footprint of fastText, which is loaded into RAM?

Constraints:

My model works fine, training was time-consuming and expensive, so I wouldn't want to retrain it using smaller vectors
I need the fastText ability to handle out-of-vocabulary words, so I can't use just vectors but I need the full model
I should reduce the RAM usage, even at the expense of a reduction in speed.

At the moment, I'm starting to experiment with compress-fasttext...

Please share your suggestions and thoughts even if they do not represent full-fledged solutions.

What parameters did you use when training FastText, & which FastText implementation? How crucial to you is the ability to generate vectors for OOV words? Also, why is the RAM size important to minimize - because a system with more RAM isn’t possible or too expensive, or other speed/performance considerations? — gojomo, Jun 29 '22 at 16:25
Thank you @gojomo! I tried to add this information into the updated question. A small addition: I should reduce RAM usage, based on constraints imposed by system administrators. — Stefano Fiorucci - anakin87, Jun 30 '22 at 08:43
Thanks! Because you need the subword info, one quick possibility - going to just full-word vectors, & possibly even slimming those to a most-frequent-word subset – isn't available. (It *might* still be possible to save some space by discarding *some* less-frequent words, which might not have much effect on whole-system performance, expecially since they'd still get OOV-synthesized vectors. But it'd likely require some custom model-trimming-and-resaving code, & you'd want to check effects in some repeatable evaluation.) — gojomo, Jun 30 '22 at 17:20
Sometimes people's concern about RAM is really about load-time, especially in some systems that might reload the model regularly (in each request, or across many service processes) - but if you're really hitting a hard cap based on some fixed/shared deployment system, you'll have to shrink the usage – or upgrade the system. (Given that +8GB RAM isn't too expensive, in either hardware or cloud rentals, at some point you may want to lobby for that. The crossover point, where lost time searching for workarounds has cost more than more-hardware would've, may be closer than 1st assumed.) — gojomo, Jun 30 '22 at 17:30
With that said, not sure I could outdo whatever that `compress-fasttext` project has achieved – which I've not used but looks effective & throrough in its evaluations. (Other ad hoc things that might work – discarding some arbitrary dimensions of the existin model, other matrix refactorizations to fewer dimensions – are probably done much better by that project.) — gojomo, Jun 30 '22 at 17:32
(There's one more neat trick used inside SPaCy's word-vector support, where they alias rarer words that are close-synonyms to other words to just reuse the same vector - getting N words for the vector-price of one, at some loss of fine word distinctions. But I don't have handy, or know of, any code to apply that to an existing FB model - even though it might work even *better* in FastTExt, given the continuing contribution of FastText subwords to further specialize words somewhat.) — gojomo, Jun 30 '22 at 17:36
(I do also see that the `compress-fasttext` author seems to have made a bunch of precompressed models available, including one based on `cc-it`, linked off their December release: https://github.com/avidale/compress-fasttext/releases/tag/gensim-4-draft) — gojomo, Jun 30 '22 at 17:43
Thank you @gojomo! Your suggestions are really valuable. I will try to update the question or provide an answer when I have done some experimenting. — Stefano Fiorucci - anakin87, Jul 01 '22 at 12:46
@gojomo after some study and your great suggestions, I tried to give a general answer. If you can, help me to make it better for the comunity. Thanks! — Stefano Fiorucci - anakin87, Aug 23 '22 at 12:46

Stefano Fiorucci - anakin87 · Accepted Answer · 2022-08-29T08:08:25.573

There is no easy solution for my specific problem: if you are using a fastText embedding as a feature extractor, and then you want to use a compressed version of this embedding, you have to retrain the final classifier, since produced vectors are somewhat different.

Anyway, I want to give a general answer for

fastText models reduction

Unsupervised models (=embeddings)

You are using pretrained embeddings provided by Facebook or you trained your embeddings in an unsupervised fashion. Format .bin. Now you want to reduce model size/memory consumption.

Straight-forward solutions:

compress-fasttext library: compress fastText word embedding models by orders of magnitude, without significantly affecting their quality; there are also available several pretrained compressed models (other interesting compressed models here).
fastText native reduce_model: in this case, you are reducing vector dimension (eg from 300 to 100), so you are explictly losing expressiveness; under the hood, this method employs PCA.

If you have training data and can perform retraining, you can use floret, a fastText fork by explosion (the company of Spacy), that uses a more compact representation for vectors.

If you are not interested in fastText ability to represent out-of-vocabulary words (words not seen during training), you can use .vec file (containing only vectors and not model weights) and select only a portion of the most common vectors (eg the first 200k words/vectors). If you need a way to convert .bin to .vec, read this answer. Note: gensim package fully supports fastText embedding (unsupervised mode), so these operations can be done through this library (more details in this answer)

Supervised models

You used fastText to train a classifier, producing a .bin model. Now you want to reduce classifier size/memory consumption.

The best solution is fastText native quantize: the model is retrained applying weights quantization and feature selection. With the retrain parameter, you can decide whether to fine-tune the embeddings or not.
You can still use fastText reduce_model, but it leads to less expressive models and the size of the model is not heavily reduced.

Looks good! I think that the native-to-Facebook-`fasttext` `reduce_model` approach may work on `--supervised`-mode models, too – but I've not tried it & the docs aren't clear. It could be worthwhile to note that its approach to reducing the number-of-dimensions is `PCA` behind the scenes. — gojomo, Aug 23 '22 at 15:39
Thank you. I tested `reduce_model` for supervised models: it works, but it's not optimal. — Stefano Fiorucci - anakin87, Aug 23 '22 at 19:28

Reduce fastText memory usage for big models

1 Answers1

fastText models reduction

Unsupervised models (=embeddings)

Supervised models