Is using FastAPI with sync threads a good practice for a high scale application?

Question

I'm using FastAPI with non-async endpoints running over gunicorn with multiple workers, from the uvicorn.workers.UvicornWorker class as suggested here. Latley, I noticed high latency in some of our endpoints during times in the day our application is busier than usual. I started to investigate it and I figured out that concurrency in our app doesn't work as we expect.

Let's say I have this FastAPI application (main.py) with the following endpoint

app = FastAPI()
logger = logging.getLogger()

@app.get("/")
def root():
    logger.info(f"Running on {os.getpid()}")
    time.sleep(3600)
    return {"message": "Hello World"}

and I run gunicorn with the following cmd:

gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

When I send to the server five requests, they all get to the same worker except the last one, instead of running in parallel over all workers:

INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 642

If I turn the endpoint into async, every request will be handled on a different worker (the last one will be holded). I know that when using non-async endpoint, FastAPI uses AnyIO threads to handle the requests, the default value for maximum threads is 40. When I try to lower this limit to 2 threads for example using the suggestion here, only the first two requests are being handled while the rest are waiting (even though I still have 4 workers!)

That's bad because both not using all our resources and suffering from python threading problems due to GIL on the same worker.

Is there a way to overcome those problems without turning to async endpoints?

While [this answer](https://stackoverflow.com/a/71517830/17865804) may not answer your question, it might give you a different perspective/solution to the problem you are faced with. — Chris, Dec 06 '22 at 05:47
Are you sending your requests concurrently to the server or in series? — John Moutafis, Mar 28 '23 at 15:31
Can you please elaborate on how you are sending the requests ? I've tried reproducing your issue but when sending a few requests the same 4 workers - 2 threads setup, requests are dispatched among 3 workers and handled in batches of 6 (I'm not sure what the 4th worker does, but it is probably left available so that i can handle incoming requests) — Pierre couy, May 14 '23 at 07:30

Is using FastAPI with sync threads a good practice for a high scale application?

0 Answers0