0

Is it possible to access a fasttext model (gensim) using multithreading?
Currently, I'm trying to load a model once (due to size and loading time), so it stays in memory and access its similarity functions multiple thousands times in a row. I want to do that in parallel and my current approach uses a wrapper class that loads the model and is then passed to the workers. But it looks like it does not return any results.

The wrapper class. Initiated once.

from gensim.models.fasttext import load_facebook_model

class FastTextLocalModel:
    def __init__(self):
        self.model_name = "cc.de.300.bin"
        self.model_path = path.join("data", "models", self.model_name)
        self.fast_text = None

    def load_model(self):
        self.fast_text = load_facebook_model(self.model_path)

    def similarity(self, word1: str = None, word2: str = None):
        return self.fast_text.wv.similarity(word1, word2)

And the Processor class makes use of the FastTextLocalModel methods above:

fast_text_instance = FastTextLocalModel()
fast_text_instance.load_model()

with concurrent.futures.ThreadPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
        docs = corpus.get_documents()  # docs is iterable
        processor = ProcessorClass(model=fast_text_instance)
        executor.map(processor.process, docs)

Using max_workers=1 seems to work.
I have to mention that I have no expertise in python multithreading.

Daniello
  • 175
  • 14

1 Answers1

0

There may be useful ideas for you in this prior answer, which may need adaptation for FastText & latest versions of gensim:

https://stackoverflow.com/a/43067907/130288

(The keys are...

  • even redundantly loading in different processes may not use redundant memory, if the key memory-consuming arrays are mmapped and thus automatically shared at the OS level; and

  • you have to do a little extra trickery to prevent the usual recalc after-load and before-similarity-ops of normed vectors, which would destroy the sharing

..but messiness in the FastText code might make these a bit harder there.)

gojomo
  • 52,260
  • 14
  • 86
  • 115