Is it possible to access a fasttext model (gensim) using multithreading?
Currently, I'm trying to load a model once (due to size and loading time), so it stays in memory and access its similarity functions multiple thousands times in a row. I want to do that in parallel and my current approach uses a wrapper class that loads the model and is then passed to the workers. But it looks like it does not return any results.
The wrapper class. Initiated once.
from gensim.models.fasttext import load_facebook_model
class FastTextLocalModel:
def __init__(self):
self.model_name = "cc.de.300.bin"
self.model_path = path.join("data", "models", self.model_name)
self.fast_text = None
def load_model(self):
self.fast_text = load_facebook_model(self.model_path)
def similarity(self, word1: str = None, word2: str = None):
return self.fast_text.wv.similarity(word1, word2)
And the Processor class makes use of the FastTextLocalModel
methods above:
fast_text_instance = FastTextLocalModel()
fast_text_instance.load_model()
with concurrent.futures.ThreadPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
docs = corpus.get_documents() # docs is iterable
processor = ProcessorClass(model=fast_text_instance)
executor.map(processor.process, docs)
Using max_workers=1
seems to work.
I have to mention that I have no expertise in python multithreading.