FastAPI Middleware: Performance issues when adding prometheus instrumentation package - increases latency of REST api

Question

We are trying to decrease the latency of our BERT model prediction service that is deployed using FastAPI. The predictions are called through the /predict endpoint. We looked into the tracing and found one of the bottlenecks is the prometheus-fastapi-instrumentator. About 1% of the requests do timeout because they exceed 10s.

We also discovered that some metrics are not getting reported on 4 requests/second. Some requests took 30-50 seconds, with the starlette/fastapi taking long times. So it seems that under high usage, the /metrics endpoint doesn't get enough resources, and hence all /metrics requests wait for some time and fail eventually. So having separate container for metrics could help. Or if possible to have metrics delayed/paused under high load. Any insight/guidance would be much appreciated.

Code Example:

This is a template I used to build my FastAPI prediction service. The only difference is I'm using a BERT based model instead of the simple model used in the template.

The chain of middlewares, like a nesting doll, call each other and the latter calls the path operation code, waits for it to return and also, in the reverse order, the control returns to all the middlewares, so I will assume that the matter is in the code of the path operation itself, could you share it in a minimally understandable version — alex_noname, Dec 06 '21 at 14:54
It varies depending on length of text, but less than 1 second — Riley Hun, Dec 06 '21 at 20:46
At first, I can advise to try to [use](https://fastapi.tiangolo.com/async/#path-operation-functions) `def predict` instead of `async def`. Then you can try to [use](https://stackoverflow.com/questions/63169865/how-to-do-multiprocessing-in-fastapi/63171013#63171013) multiprocessing for predict. And recheck your estimations. — alex_noname, Dec 07 '21 at 04:18

FastAPI Middleware: Performance issues when adding prometheus instrumentation package - increases latency of REST api

0 Answers0