With my team, we're currently building an API using FastAPI and we're really struggling to get good performances out of it once deployed to Kubernetes. We're using async calls as much as possible but overall sitting at ~8RPS / pod to stay under our SLA of P99 200ms.
For resources, we assign the following:
resources:
limits:
cpu: 1
memory: 800Mi
requests:
cpu: 600m
memory: 100Mi
Surprisingly, such performance drops don't occur when running load tests on the API running locally in a Docker container. There we easily get ~200RPS on a single container with 120ms latency at P99...
Would anyone have an idea of what could go wrong in there and where I could start looking to find the bottleneck?
Cheers!