9

I am using gunicorn to run a simple HTTP server1 using e.g. 8 sync workers (processes). For practical reasons I am interested in knowing how gunicorn distributes incoming requests between these workers.

Assume that all requests take the same time to complete.

Is the assignment random? Round-robin? Resource-based?

The command I use to run the server:

gunicorn --workers 8 bind 0.0.0.0:8000 main:app

1 I'm using FastAPI but I believe this is not relevant for this question.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Nur L
  • 791
  • 7
  • 15

2 Answers2

6

Gunicorn does not distribute requests.

Each worker is spawned with the same LISTENERS (e.g. gunicorn.sock.TCPSocket) in Arbiter.spawn_worker(), and calls listener.accept() on its own.

The assignment in the blocking OS calls to the socket's accept() method — i.e. whichever worker is later woken up by the OS kernel and given the client connection — is an OS implementation detail that, empirically, is neither round-robin nor resource-based.

Reference from the docs

From https://docs.gunicorn.org/en/stable/design.html:

Gunicorn is based on the pre-fork worker model. ... The master never knows anything about individual clients. All requests and responses are handled completely by worker processes.

Gunicorn relies on the operating system to provide all of the load balancing when handling requests.

Other reading

aaron
  • 39,695
  • 6
  • 46
  • 102
1

In my case (also with FastAPI), I found that it starts with round-robin, and then turns to be stupid once all workers are full.

Example:

  • you send 100 requests at the same time
  • the first 8 are distributed across the 8 sync workers
  • the remaining 92 will then be assigned to the first worker which will be free, from the first 8
  • only once ALL (or many) workers are free again, will the new requests be assigned to those in a more balanced way

I am trying to fix that inefficient behavior for the 92 requests mentioned above. No success thus far.

Hopefully, someone else can add their insights??

tyrex
  • 8,208
  • 12
  • 43
  • 50
  • Why is it "inefficient behaviour" to assign to whichever worker is free? – aaron Dec 25 '22 at 05:08
  • It's ok if the 1st request is assigned to the free worker, but then this worker shouldn't be considered free anymore. However, in my case, also the 2nd, ...,91st AND 92nd are all assigned to that worker. – tyrex Dec 26 '22 at 11:51
  • SyncWorker only accepts a client connection when it is free. Requests are not specifically assigned to a possibly unavailable worker in a pool (see my answer). If a worker handles all those requests, then each of those requests are only coming in after the prior request has been handled. It is likely not an equal distribution, but the worker is certainly free. I'm not sure why you think it shouldn't be "considered free". The "load balancing" reading linked in my answer shows an attempt to achieve an "equal distribution" but there are clear performance penalties due to the significant overhead. – aaron Dec 26 '22 at 12:30
  • Thank you @aaron for providing those details. I am running experiments and am seeing the behavior that I describe, i.e. that a worker gets assigned all those requests BEFORE it has handled the first ones. Since you mention that this shouldn't happen (correct?), then maybe something else (perhaps a bug in my code) is going on. Will investigate again. – tyrex Dec 26 '22 at 21:12
  • I have to ask, are you sure you are using SyncWorker? – aaron Dec 27 '22 at 00:14
  • I am using a FastAPI-specific worker class as follows: `gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:80`. I will add a minimal, reproducible example, but it may take some time before I get to create and upload it – tyrex Dec 28 '22 at 00:23
  • 1
    Well, that's not SyncWorker, which this question is originally about. UvicornWorker does [`loop.create_server()`](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.create_server), which calls `sock.accept()` on each event loop tick (by design). There's no "fix" for it at the moment — the feature request [Support a concurrency ceiling without returning 503, like gunicorn's worker_connections (encode/uvicorn#865)](https://github.com/encode/uvicorn/issues/865) was rejected in Dec 2020. – aaron Dec 28 '22 at 11:20