I trained a machine learning sentence classification model that uses, among other features, also the vectors obtained from a pretrained fastText model (like these) which is 7Gb. I use the pretrained fastText Italian model: I am using this word embedding only to get some semantic features to feed into the effective ML model.
I built a simple API based on fastText that, at prediction time, computes the vectors needed by the effective ML model. Under the hood, this API receives a string as input and calls get_sentence_vector
. When the API starts, it loads the fastText model into memory.
How can I reduce the memory footprint of fastText, which is loaded into RAM?
Constraints:
- My model works fine, training was time-consuming and expensive, so I wouldn't want to retrain it using smaller vectors
- I need the fastText ability to handle out-of-vocabulary words, so I can't use just vectors but I need the full model
- I should reduce the RAM usage, even at the expense of a reduction in speed.
At the moment, I'm starting to experiment with compress-fasttext...
Please share your suggestions and thoughts even if they do not represent full-fledged solutions.