Where is the bottleneck and spikes

Question

I have a Async FastAPI endpoint. This endpoint essentially at a high level gets data and does some simple business calculations and returns a json model.

It makes 3 database queries within asyncio.gather(). These calls are made using SQALCHEMY ORM 1.4. The queries are as optimized as can be (index being used, small result sets, only columns needed etc. I use the depends within FastAPI for the async engine connection as well.

This is the only I/O then after the 3 queries are returned there is some non intensive business logic (few date calculations and match cases) ran and a new “api response” model is generated and then cached in a redis and returned to the user. Pretty straightforward and no frills.

However I do not know if my understanding of FastAPI capabilities or pythons are mislead but I can’t seem to reach a high RPS on the endpoint as well as latency is also not as I expect.

I have it deployed using kubernetes and what not and resource wise I have it currently set to 50 pods with 1.5 CPU and 2gb memory. The cpu is more than enough as I can see in monitoring of the pods and memory is plenty. With this when load testing can only reach around 4.5k RPS max, does this seem right? I know other drop wizard services and spring boot APIs of similar scale and functionality reach much higher RPS of like 15k RPS and latency was 2-3x quicker.

I know code wise it is pretty much optimized there are a few thing I know I could do like not use ORM or that but I have tested with that and it does little to no effect on performance.

I noticed “spikes” in latency as well such as the p90 would be ~40ms but then the p95 would be ~400ms and then some response even get as high as 1.2s. This is not even when there is high traffic all the time.

I’m running out of ideas but I just want to even know is this kind of RPS expected from fast api/ python 3.10 or is it a case of I’m missing something?

One thing I seen was to increase the workers as we are not setting that. But we are using k8s and think doing it at cluster level so assume it wouldn’t make any difference? — K.Madden, May 13 '23 at 00:41
Please have a look at [this answer](https://stackoverflow.com/a/71517830/17865804) — Chris, May 13 '23 at 02:16
I assume you have a load balancer installed. Maybe you should using python profiling tools. — Memristor, May 13 '23 at 12:58
@Memristor yep it is and I will try that out, any suggestions yourself you have for profiling a async FastAPI endpoint — K.Madden, May 14 '23 at 18:56
Try Py-Spy and check for operations not defined as asynchronous. — Memristor, May 15 '23 at 10:05

Where is the bottleneck and spikes

0 Answers0