How does fastapi/uvicorn parallelize requests?

Question

I ran an experiment with fastapi and uvicorn that I don't understand the outcome of.

On the code

@app.get('/loadtest')
def root():
    time.sleep(1)
    return {'message': 'hello'}

running in docker with

CMD ["uvicorn", "app.main:app", "--proxy-headers", "--host", "0.0.0.0", "--port", "80"]

I ran the following test:

ab -c 100 -n 1000 localhost/loadtest

which gives me the results:

bersling-2:cas bersling$ ab -c 100 -n 1000 localhost/loadtest
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        uvicorn
Server Hostname:        localhost
Server Port:            80

Document Path:          /loadtest
Document Length:        19 bytes

Concurrency Level:      100
Time taken for tests:   85.052 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      163000 bytes
HTML transferred:       19000 bytes
Requests per second:    11.76 [#/sec] (mean)
Time per request:       8505.191 [ms] (mean)
Time per request:       85.052 [ms] (mean, across all concurrent requests)
Transfer rate:          1.87 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0       4
Processing:  1008 7964 1419.3   8010    9022
Waiting:     1004 7963 1419.3   8008    9020
Total:       1008 7965 1418.7   8010    9024

Percentage of the requests served within a certain time (ms)
  50%   8010
  66%   8016
  75%   8988
  80%   8989
  90%   8993
  95%   8998
  98%   9003
  99%   9006
 100%   9024 (longest request)

So we see this takes around 100s to complete. However, I would have expected 1000s, since I'd assume the requests need to be worked sequentially and each request takes one second. I assumed the sequential queue since I thought python can only ever handle one request at a time (in synchronous mode) and I'm not aware that uvicorn would spawn multiple processes or threads. So I don't understand how the 100s result instead of the 1000s result is possible. Can somebody please explain?

If I'm not mistaken, the workers are helping uvicorn dispatching the workload. — DueSouth, Dec 22 '21 at 09:15
Uvicorn is also an asynchronous engine, so he can process more requests than Gunicorn for instance. This makes the processing of the requests faster. — DueSouth, Dec 22 '21 at 09:21
@Charley, the pretty-odd thing here is that uvicorn is running in a single process. There are no other workers. Moreover, the endpoint is not defined with `async`. It's not clear from where such a parallelization comes from — floatingpurr, Jan 03 '22 at 14:51
As per the [doc](https://fastapi.tiangolo.com/deployment/concepts/#replication-processes-and-memory), _"With a FastAPI application, using a server program like Uvicorn, running it once in one process can serve multiple clients concurrently."_ Apparently, it happens despite the presence of asynchronous code — floatingpurr, Jan 03 '22 at 15:05
Ok, here you can find the explanation: https://github.com/tiangolo/fastapi/discussions/4358 — floatingpurr, Jan 04 '22 at 09:06
@floatingpurr thank you for the investigation, will add that as an answer — bersling, Jan 04 '22 at 16:49

score 11 · Answer 1 · answered Jan 04 '22 at 16:55

To quote the docs:

When you declare a path operation function with normal def instead of async def, it is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server).

Or to quote the answer from the github discussion which is a bit simpler to read:

For endpoints defined with def (not async def), FastAPI will run them in a threadpool, exactly as to avoid blocking the server and allow multiple requests to be served in parallel.

Which then raises the question of the number of concurrent threads and how this can be controlled. This question is addressed here.

For convenience again the quote:

fastAPI is based on starlette that use to control the ThreadPoolExecutor , but starlette is now using anyio , so I don't see a better way than your proposition :

RunVar("_default_thread_limiter").set(CapacityLimiter(2))

How does fastapi/uvicorn parallelize requests?

1 Answers1

Linked