FastAPI + Uvicorn + multithreading. How to make web app to work with many requests in parallel?

Question

I'm new to Python development. (But I have doenet background) I do have a simple FastAPI application

from fastapi import FastAPI
import time
import logging
import asyncio
import random

app = FastAPI()
r = random.randint(1, 100)
logging.basicConfig(level="INFO", format='%(levelname)s | %(asctime)s | %(name)s | %(message)s')
logging.info(f"Starting app {r}")

@app.get("/")
async def long_operation():
    logging.info(f"Starting long operation {r}")
    await asyncio.sleep(1)
    time.sleep(4) # I know this is blocking and the endpoint marked as async, but I actually do have some blocking requests in my code.
    return r

And I run the app using this comand:

uvicorn "main:app" --workers 4

And the app starts 4 instances in different processes:

INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started parent process [22112]
INFO | 2023-05-11 12:32:43,544 | root | Starting app 17
INFO:     Started server process [10180]  
INFO:     Waiting for application startup.
INFO:     Application startup complete.   
INFO | 2023-05-11 12:32:43,579 | root | Starting app 58
INFO:     Started server process [29592]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO | 2023-05-11 12:32:43,587 | root | Starting app 12
INFO:     Started server process [7296]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO | 2023-05-11 12:32:43,605 | root | Starting app 29
INFO:     Started server process [15208]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Then I open the 3 browser tabs and start sending requests to the app as parallel as possible. And here is the log:

INFO | 2023-05-11 12:32:50,770 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:32:55,774 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:33:00,772 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:33:05,770 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:33:10,790 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:33:15,779 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:33:20,799 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:33:25,814 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK
INFO | 2023-05-11 12:33:30,856 | root | Starting long operation 29
INFO:     127.0.0.1:55031 - "GET / HTTP/1.1" 200 OK

My observations:

Only 1 process is working. Others do not handle requests (I have tried many times. It is always like that.)
4 different instances are created.

My questions:

Why only one process does work and others don't?
If I want to have an in-memory cache. Can I achieve that?
Can I run 1 process which can handle some amount of requests in parallel?
Can this be somehow related to the fact that I do tests on Windows?

UPDATE+SOLUTION:

My real problem was the def/async def behavior (which I find very confusing). I was trying to solve the problem with a blocked thread using multiple workers which worked wired for my case as well (only 1 actually worked) and that's probably because I used a single browser with many tabs. Once I tested the service using JMeter it showed me that all workers were used. But the solution with multiple processes was not the right one for me. The better one was to try to unblock the single thread in a single process. At first, I used the following approach because I used an external library with SYNC IO function. However I have found an ASYNC variant of that function. So the problem was solved by using the correct library. Thank you all for your help.

freakish · Accepted Answer · 2023-05-11T11:08:52.213

Why only one process does work and others don't?

I can't reproduce your observation. And in fact I don't know how you deduced that. If I change your logging format and add

logging.basicConfig(level="INFO", format='%(process)d | %(levelname)s | %(asctime)s | %(name)s | %(message)s')

(note the %(process)d which prints process' id) then I see in logs

19968 | INFO | 2023-05-11 12:45:53,297 | root | Starting long operation 35
21368 | INFO | 2023-05-11 12:45:56,112 | root | Starting long operation 90
5268 | INFO | 2023-05-11 12:45:56,626 | root | Starting long operation 3
22024 | INFO | 2023-05-11 12:45:57,032 | root | Starting long operation 19
5268 | INFO | 2023-05-11 12:45:57,416 | root | Starting long operation 3
22024 | INFO | 2023-05-11 12:45:57,992 | root | Starting long operation 19

after spawning multiple requests in parallel. Is it possible that you've incorrectly fired your requests? Not in parallel?

Anyway yes, all workers are utilized. The exact way they are chosen is however an implementation detail.

If I want to have in-memory cache. Can I acheave that?

You mean shared between workers? Not really. You can do some cross-process communication (e.g. shared memory), but this is not simple to do and maintain. Generally we would use an in-memory cache per process. Unless you are limited by memory, in which case it becomes a problem, indeed.

Can I run 1 process which can handle some amount of requests in parallel?

I'm not sure I get your question. You can run uvicorn with --workers 1 if you want, no problem. Python's default async runtime is single threaded though, so you won't get true parallelism. But instead concurrency, similar to how JavaScript works. And therefore you need to be careful, you have to avoid blocking calls like time.sleep and use non-blocking calls like asyncio.sleep. Well, with async programming you always have to be careful when doing that, regardless of how many processes you spawn.

Can this be somehow related to the fact that I do tests on Windows?

No, this is unrelated to the operating system. This design is due to the major flaw of Python itself: it has GIL (Global Interpreter Lock) which makes threads a lot less useful compared to other runtimes like dotnet/C#. In Python true parallelism is achieved through subprocesses.

FastAPI + Uvicorn + multithreading. How to make web app to work with many requests in parallel?

1 Answers1