1

With my team, we're currently building an API using FastAPI and we're really struggling to get good performances out of it once deployed to Kubernetes. We're using async calls as much as possible but overall sitting at ~8RPS / pod to stay under our SLA of P99 200ms.

For resources, we assign the following:

resources:
    limits:
        cpu: 1
        memory: 800Mi
    requests:
        cpu: 600m
        memory: 100Mi

Surprisingly, such performance drops don't occur when running load tests on the API running locally in a Docker container. There we easily get ~200RPS on a single container with 120ms latency at P99...

Would anyone have an idea of what could go wrong in there and where I could start looking to find the bottleneck?

Cheers!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
dernat71
  • 365
  • 4
  • 16
  • 1
    Please have a look at [**this answer**](https://stackoverflow.com/a/71517830/17865804) to understand the differece between using `def` and `async def`, and how your API's performance may get affected by CPU-bound operations, when using asynchronous code. – Chris Aug 03 '22 at 13:37
  • 1
    @Chris That doesn’t seem the issue here, as it works fantastic locally with a load test. dernat71 how is your k8s cluster sized? Don’t forget about the deamons that take up (sometimes a significant amount of) resources. – JarroVGIT Aug 03 '22 at 20:06
  • 2
    @JarroVGIT I am aware of; however, from their saying (i.e., _"We're using async calls as much as possible, but..."_), they don't seem to have a clear view of how [`async /await`](https://fastapi.tiangolo.com/async/) works (which might be another cause for the performance results they've obtained) - hence, the suggested answer. – Chris Aug 04 '22 at 04:06

2 Answers2

1

First, try to request at least 1 CPU for your API, because if there are no available CPUs on the node, the pod will only use the reserved amount of CPUs which is 600m, so if you have another application with requests cpu=400m for example, kubernetes will run both applications on the same cpu, with 60% of the time for the API and 40% for the second application. While docker uses 1 CPU (maybe more) in localhost.

If you are using Uvicorn with multiple workers, you can also increase CPU limits to or at least 2.

   Resources:
     limits:
         processor: 2
         memory: 800Mi
     requests:
         processor: 1
         memory: 100Mi

Finally, there is a difference between your local machine CPUs and kubernetes cluster CPUs, if you want to get good performance, you can test better CPUs and choose the most suitable one in terms of cost.

Hussein Awala
  • 4,285
  • 2
  • 9
  • 23
1

It finally appeared that our performance issues were caused by the non-usage of gunicorn and only uvicorn (even though FastAPI's author recommends against this in his documentation). On the other hand, Uvicorn authors are recommending the other way round in their docs, i.e, using gunicorn. We followed that advice and our performances issues were gone.

As suggested by people in this thread, setting more CPUs in request of our PodSpec was also part of the solution.

EDIT: In the end, we finally discovered that the performance issues were caused by our implementation of OpenTelemetry over FastAPI using the opentelemetry-instrument CLI. The latter was causing a lot of overhead and blocking calls over the async of FastAPI. Performances are now super stable using both gunicorn or uvicorn. We are still using gunicorn with multiple workers but we are also planning to move back to uvicorn single-process and scale more dynamically.

dernat71
  • 365
  • 4
  • 16
  • 1
    @dernat17 Can you please say, how you implemented gunicorn or uvicorn on k8s! I am currently using - CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"] - this command on my Dockerfile! And deploy it to k8! Did you change anything here? – Md Fazlul Karim Sep 18 '22 at 13:46
  • CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080", "--workers", "4"] - Did you do this on your Dockerfile? Isn't it an antipattern? – Md Fazlul Karim Sep 18 '22 at 13:49
  • @MdFazlulKarim we finally discovered that the performance issues were caused by our implementation of OpenTelemetry which was causing a lot of overhead and blocking calls over the async of FastAPI. Performances are now super stable using both gunicorn/uvicorn. We are still using gunicorn with multiple workers but we are also planning to move back to uvicorn single-process and scale more dynamically – dernat71 Sep 19 '22 at 15:03
  • @dernat17 Great to know. May I ask, what is your current alternative to OpenTelemetry? – Md Fazlul Karim Sep 21 '22 at 11:13
  • 1
    @MdFazlulKarim: we're still using OpenTelemetry as it appears to be the direction the whole domain is taking. We made sure the usage of the opentelemetry-instrument CLI wrapper isn't blocking anymore :-) – dernat71 Sep 21 '22 at 16:05