FastAPI application serving a ML model has blocking code?

Question

We have a ML model served with Flask. Load testing the Flask application with Gatling (https://gatling.io/) resulted in very low performance. It could not handle a lot of requests per second. Therefore we have moved to FastAPI.

Serving it locally in a Docker Container with uvicorn or gunicorn worked well. However we have noticed that the application doesn't respond for minutes: Gatling Load Test - Local Docker Container

In this image you can see that the application responds in "batches". Serving our application in a Kubernetes cluster leads to a restart of the container, because the responsible container won't succeed the readiness/liveness probe.

We have asked this question on uvicorn's git. However, I don't think we will get an answer there. We think it might be that we have written code which is blocking the main thread and therefore our FastAPI application won't answer for minutes.

Snippet of the application endpoint:

async def verify_client(token: str):
    credentials_exception = HTTPException(
        status_code=status.HTTP_401_UNAUTHORIZED,
        detail="Could not validate credentials",
        headers={"WWW-Authenticate": "Bearer"},
    )
    try:
        return jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM], audience=AUDIENCE)
    except JWTError:
        raise credentials_exception


@app.post("/score", response_model=cluster_api_models.Response_Model)
async def score(request: cluster_api_models.Request_Model, token: str = Depends(oauth2_scheme)):
    logger.info("Token: {0}".format(token))
    await verify_client(token)
    result = await do_score(request)
    return result

The await do_score(request) has all the preprocessing and prediction code. It's using a gensim fasttext model to create document vectors and a scikit-learn K-Means model. do_score() is defined with async def do_score(request). From the documentation of FastAPI, we thought this would be enough to make our application asynchronous. However it doesn't look like it. It's still processing it sequentially and additional it doesn't respond for minutes. The method also includes a nested for loop O(n²)... not sure whether that can cause blocking code too.

I hope the information provided is enough to get started. If you need more to information about the code, please tell me. I will need to change some variable names of the code then. Thank you very much in advance!

score 1 · Answer 1 · answered Jan 12 '21 at 21:04

1

The correct answer here is to use a non-async route def. FastAPI documents that any routes which contain sync code should be placed into these types of routes so it can then place your code into its own internal thread pool thus creating a pseudo async route.

@app.post("/score", response_model=cluster_api_models.Response_Model)
def score(request: cluster_api_models.Request_Model, token: str = Depends(oauth2_scheme)):
    logger.info("Token: {0}".format(token))
    verify_client(token)
    result = do_score(request)
    return result

Here's a link to the documentation from fastapi that im referencing.

https://fastapi.tiangolo.com/async/#path-operation-functions

answered Jan 12 '21 at 21:04

NoPlaceLike127.0.0.1

415
1
4
12

Can you define the correct answer, and what makes my answer non-correct? I've been around the FastAPI community for a long time and i'm aware of the corner cases, there are big issues like [596](https://github.com/tiangolo/fastapi/issues/596) and [1624](https://github.com/tiangolo/fastapi/issues/1624) which causes memory leaks in a long run, running a function in a threadpool that takes ~500ms ain't no good. If you have function that takes `1s > ns > 5s.` then you can use it for `>5s` you should go for other things. – Yagiz Degirmenci Jan 12 '21 at 21:23
Thanks for linking for those issues, I was not aware of any memory leaks. So I probably should have said that the accepted answer left out the more simplistic solution(non async route defs) which is documented and supported via FastAPI. I think my answer did define what the correct answer is IMO. What is it missing? – NoPlaceLike127.0.0.1 Jan 13 '21 at 14:47

score 0 · Accepted Answer · answered Nov 19 '20 at 22:06

0

Of course, something would block your application if your application is not fully async, async is just a fancy keyword here.

Even if you define a function with async def if it does something blocking underneath it will block the entire execution of your app. Aren't you convinced? Test it.

@app.get("/dummy")
async def dummy():
    time.sleep(5)

Let's send 3 concurrent requests to it.

for _ in {1..3}; do curl http://127.0.0.1:8000/dummy &; done

This will take +15 seconds.

Let's dive deeper, I said async def is just a fancy syntax of declaring a coroutine, why? See PEP 492

async def functions are always coroutines, even if they do not contain await expressions.

Why does it matter?

When you define a coroutine, with await syntax you are saying your event loop to keep going, well, it does that, it switches to another coroutine and runs it.

What is the difference?

Basically, coroutines don't wait for the results, it just keeps going. But when you define a normal function it will wait for the execution of that of course.

Since we both know it would block, what you can do?

You might want to use a Job/Task Queue library like Celery.

answered Nov 19 '20 at 22:06

Yagiz Degirmenci

16,595
7
65
85

Thank you very much for the fast answer. We thought FastAPI would take care of creating a Job/Task Queue. Guess we were totally wrong. So each endpoint (liveness/readiness check) of the FastAPI application and each function needs to be taken care of by Celery? – AFUEU Nov 20 '20 at 08:13
Update: I have just read about the BackgroundTasks in FastAPI. It recommends to use it for small tasks or usecases where you need to access the variables created within the same FastAPI application. Since we are using a fairly large text model, Celery might not be an option. Will update, what works for us. – AFUEU Nov 20 '20 at 08:39
In our case, we have models that approx take up 3-5 minutes and small tasks like 20 seconds, but we use Celery for all our tasks by setting a priority, the reason I did not suggest `BackgroundTasks`. There are a lot of issues going out there with it. Here is the most common [one](https://github.com/encode/starlette/issues/919) – Yagiz Degirmenci Nov 20 '20 at 09:33
How do you make sure Celery (I am really not against this option), can access the models loaded in the FastAPI application or do you load your model on each request? – AFUEU Nov 20 '20 at 12:41
The usual time to execute the method `do_score(request)` is about 500ms. – AFUEU Nov 20 '20 at 12:51
No, it does not directly run on API, it is completely independent, maybe our use case will not fit yours because we have a process like this `Request -> Add to the RabbitMQ -> Call that from Celery -> Process the data with models -> Insert results to the database` Then we are able to query the results. – Yagiz Degirmenci Nov 20 '20 at 12:58
But in your case, 500ms is such a low latency, have you done any profiling on your app? – Yagiz Degirmenci Nov 20 '20 at 12:59
Aside from the usual logging messages and using Gatling, there is nothing else I can think of now. – AFUEU Nov 23 '20 at 10:40

FastAPI application serving a ML model has blocking code?

2 Answers2

Why does it matter?

What is the difference?

Since we both know it would block, what you can do?