How to use FastAPI and Spacy together to handle multiple requests in parallel

Question

I am using spacy with FastAPI to serve requests for some nlp task. I am loading spacy large model while the API starts and the requests are served using that model. What I'm seeing is time for multiple requests is increasing linearly with the number of parallel requests. How to integrate spacy with FastAPI so that multiple requests can be served at the same time without an increase in time. I have a 4 core CPU and single request takes about 4ms. I would like to serve 4 requests at the same time in 4ms.

Could it be that spacy doesn't overcome the GIL and blocks on each request? — olepinto, Jan 08 '21 at 08:22
You could try to use multiprocessing for NLP, in order to utilize all cpu cores. Example of using multiprocessing with FastApi [here](https://stackoverflow.com/a/63171013/13782669) — alex_noname, Jan 08 '21 at 10:35

score 0 · Answer 1 · answered Jan 22 '21 at 17:38

Not very familiar with Spacy, but in general, if you have blocking code you should put it into a non-async route def. This is because FastAPI will place the calls to this route into a threadpool.

https://fastapi.tiangolo.com/async/#path-operation-functions

@app.get("/blocking")
def blocks():
    # do something blocking
    pass

You can also try fastapi's background tasks(https://fastapi.tiangolo.com/tutorial/background-tasks/)

How to use FastAPI and Spacy together to handle multiple requests in parallel

1 Answers1