Running Flair embeddings parallel

Question

I have a list containing millions of sentences for which I need embeddings. I am using Flair for this purpose. The problem seems like it should be embarrassingly parallel. But when I try to optimize, I get either no increase in performance, or it simply stalls.

I define my sentences as a simple list of strings:

texts = [
    "this is a test",
    "to see how well",
    "this system works",
    "here are alot of words",
    "many of them",
    "they keep comming",
    "many more sentences",
    "so many",
    "some might even say",
    "there are 10 of them",
]

I use Flair to create the embeddings:

from flair.embeddings import SentenceTransformerDocumentEmbeddings
from flair.data import Sentence

sentence_embedding = SentenceTransformerDocumentEmbeddings("bert-base-nli-mean-tokens")

def sentence_to_vector(sentence):
    sentence_tokens = Sentence(sentence)
    sentence_embedding.embed(sentence_tokens)
    return sentence_tokens.get_embedding().tolist()

I tried with both Joblib Concurrent Futures to solve the problem in parallel:

import time
from joblib import Parallel, delayed
import concurrent.futures

def parallelize(iterable, func):
    return Parallel(n_jobs=4, prefer="threads")(delayed(func)(i) for i in iterable)

print("start embedding sequentially")
tic = time.perf_counter()
embeddings = [sentence_to_vector(text) for text in texts]
toc = time.perf_counter()
print(toc - tic)

print("start embedding parallel, w. joblib")
tic = time.perf_counter()
embeddings = parallelize(texts, sentence_to_vector)
toc = time.perf_counter()
print(toc - tic)

print("start embedding parallel w. concurrent.futures")
tic = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
    embeddings = [executor.submit(sentence_to_vector, text) for text in texts]
toc = time.perf_counter()
print(toc - tic)

The Joblib function is running, but it is slower than doing it sequential. The concurrent.futures function spins up a bunch of threads but hangs indefinitely.

Any solutions or hints in the right direction would be much appreciated.

Try to analyze if flair can handle parallel requests, or will queue them to process one by one. i.e. if your client sends requests in parallel but the server can only handle one by one, the speed will be same. — Deepak Garud, Sep 02 '22 at 10:11
Flair is a python package, so the computation is never leaving my computer — NicolaiF, Sep 02 '22 at 10:40
I was comparing it with a model that you have trained, and using for recognizing something - if you ask it to recognize two items in parallel, can it do so. — Deepak Garud, Sep 02 '22 at 10:56
You could split the list into N sub-sets each and run N instances of the encoding generation script and finally merge the outputs together too. Here, N is the number of cores on your machine - better yet, one less than the available cores. — gavin, Sep 04 '22 at 07:01
@gavin I have attempted to do so, but something in the flair implementation inhibits this from being effective — NicolaiF, Sep 04 '22 at 10:12
@DeepakGarud No something is prohibiting this, I am not sure what — NicolaiF, Sep 04 '22 at 10:13
You can also try to create separate processes - e.g. make 10 copies of your script and sent 1/10 of your sentences to each. Then combine the results. You can use sub processes, or batch file scripting - whatever you are more comfortable with. Keep an eye on CPU and memory used to avoid machine getting 100 pc CPU usage. — Deepak Garud, Sep 05 '22 at 06:38
As stated above, this has already been attempted with concurrent.futures & joblib with no success. — NicolaiF, Sep 05 '22 at 08:28
There is a difference, what my understanding is - all methods tried so far had a single python program (using flair). By making copies of the file, and running all - there should be no problem for parallel processing. — Deepak Garud, Sep 05 '22 at 09:20
What are you using CPU or GPU? Are you sure you are not utilizing your hardware at 100% already? Because `flair` internally uses multiple cores of your CPU for computations so parallelizing yourself may gives you nothing but overhead due to additional threads. — u1234x1234, Sep 08 '22 at 20:12

score 1 · Answer 1 · answered Sep 05 '22 at 09:50

Using the analogy of a trained model - It appears that the trained model is only capable of recognizing one item at a time.

By making copies of the file, and running all - there should be no problem for parallel processing e.g. Prog1.py, prog2.py ... are copies of the same code - when run they get different data to process. To manually run in parallel, open multiple command windows and run a different file in each.

To run programmatically, a master program can create sub processes and send different data to each. Or a batch file can launch the programs. e.g. run 10 copies of your script and sent 1/10 of your sentences to each.

Then combine the results.

Keep an eye on CPU and memory used to avoid machine getting 100 pc CPU usage. (Slowly increase number of programs and data as you figure out how many parallel programs can be handled by the computer)

Running Flair embeddings parallel

1 Answers1