I am using spacy with FastAPI to serve requests for some nlp task. I am loading spacy large model while the API starts and the requests are served using that model. What I'm seeing is time for multiple requests is increasing linearly with the number of parallel requests. How to integrate spacy with FastAPI so that multiple requests can be served at the same time without an increase in time. I have a 4 core CPU and single request takes about 4ms. I would like to serve 4 requests at the same time in 4ms.
Asked
Active
Viewed 2,554 times
1
-
Could it be that spacy doesn't overcome the GIL and blocks on each request? – olepinto Jan 08 '21 at 08:22
-
You could try to use multiprocessing for NLP, in order to utilize all cpu cores. Example of using multiprocessing with FastApi [here](https://stackoverflow.com/a/63171013/13782669) – alex_noname Jan 08 '21 at 10:35
1 Answers
0
Not very familiar with Spacy, but in general, if you have blocking code you should put it into a non-async route def. This is because FastAPI will place the calls to this route into a threadpool.
https://fastapi.tiangolo.com/async/#path-operation-functions
@app.get("/blocking")
def blocks():
# do something blocking
pass
You can also try fastapi's background tasks(https://fastapi.tiangolo.com/tutorial/background-tasks/)

NoPlaceLike127.0.0.1
- 415
- 1
- 4
- 12