How does Waitress handle concurrent tasks?

Question

I'm trying to build a python webserver using Django and Waitress, but I'd like to know how Waitress handles concurrent requests, and when blocking may occur.

While the Waitress documentation mentions that multiple worker threads are available, it doesn't provide a lot of information on how they are implemented and how the python GIL affects them (emphasis my own):

When a channel determines the client has sent at least one full valid HTTP request, it schedules a "task" with a "thread dispatcher". The thread dispatcher maintains a fixed pool of worker threads available to do client work (by default, 4 threads). If a worker thread is available when a task is scheduled, the worker thread runs the task. The task has access to the channel, and can write back to the channel's output buffer. When all worker threads are in use, scheduled tasks will wait in a queue for a worker thread to become available.

There doesn't seem to be much information on Stackoverflow either. From the question "Is Gunicorn's gthread async worker analogous to Waitress?":

Waitress has a master async thread that buffers requests, and enqueues each request to one of its sync worker threads when the request I/O is finished.

These statements don't address the GIL (at least from my understanding) and it'd be great if someone could elaborate more on how worker threads work for Waitress. Thanks!

@variable Unfortunately not. From briefly looking at the [waitress github repo](https://github.com/Pylons/waitress), it doesn't seem like they did anything to work around the GIL, although I cannot say for certain. For the moment my team is sticking with Waitress as our app doesn't require too high a level of concurrency. — evantkchong, Mar 06 '20 at 09:21
When using the default dev flask server, we can set the number of processes using https://werkzeug.palletsprojects.com/en/1.0.x/serving/#werkzeug.serving.run_simple - does this not exist in waitress? — variable, Mar 06 '20 at 09:44
Yes the number of workers can be configured but this says nothing of their blocking behavior — evantkchong, Mar 06 '20 at 09:46
If a worker means an independent process, then this means that each process has its own python interpreter. isnt it? — variable, Mar 06 '20 at 09:48
Yes it does. Perhaps I didn't express my concerns correctly. In the documentation it is stated that "I/O is always done asynchronously (by wasyncore) in the main thread. Worker threads never do any I/O." If two tasks are scheduled at the same time which involve I/O of the same resource, how does (or doesn't) the thread dispatcher deconflict this? — evantkchong, Mar 06 '20 at 10:25
Can we configure the number of worker processes in waitress? Is it same as the threads option? — variable, Mar 06 '20 at 11:16
Waitress won't block when a slow client takes time to respond. — Tiago Martins Peres, Mar 09 '20 at 16:50
I'm not entirely understand you question. I'm not familiar with waitress, but it sounds like it has a listener thread that accepts connections, then push the 'channel' so called to a queue for workers. workers then picks this up and serves the client. It's unrelated to the GIL, which sits on a much lower level. Maybe this will help you better understand the GIL - https://opensource.com/article/17/4/grok-gil — Chen A., Mar 13 '20 at 08:13
@ChenA. - does a worker mean a thread or a separate process? — variable, Mar 13 '20 at 11:52
@variable worker is a paradigm; it can refer to both. It depends on the implementation — Chen A., Mar 13 '20 at 12:59
@ChenA. I know what the GIL entails. It is the implementation that we need details on — evantkchong, Mar 14 '20 at 11:34

xyres · Accepted Answer · 2020-03-13T08:59:05.820

13

Here's how the event-driven asynchronous servers generally work:

Start a process and listen to incoming requests. Utilizing the event notification API of the operating system makes it very easy to serve thousands of clients from single thread/process.
Since there's only one process managing all the connections, you don't want to perform any slow (or blocking) tasks in this process. Because then it will block the program for every client.
To perform blocking tasks, the server delegates the tasks to "workers". Workers can be threads (running in the same process) or separate processes (or subprocesses). Now the main process can keep on serving clients while workers perform the blocking tasks.

How does Waitress handle concurrent tasks?

Pretty much the same way I just described above. And for workers it creates threads, not processes.

how the python GIL affects them

Waitress uses threads for workers. So, yes they are affected by GIL in that they aren't truly concurrent though they seem to be. "Asynchronous" is the correct term.

Threads in Python run inside a single process, on a single CPU core, and don't run in parallel. A thread acquires the GIL for a very small amount of time and executes its code and then the GIL is acquired by another thread.

But since the GIL is released on network I/O, the parent process will always acquire the GIL whenever there's a network event (such as an incoming request) and this way you can stay assured that the GIL will not affect the network bound operations (like receiving requests or sending response).

On the other hand, Python processes are actually concurrent: they can run in parallel on multiple cores. But Waitress doesn't use processes.

Should you be worried?

If you're just doing small blocking tasks like database read/writes and serving only a few hundred users per second, then using threads isn't really that bad.

For serving a large volume of users or doing long running blocking tasks, you can look into using external task queues like Celery. This will be much better than spawning and managing processes yourself.

edited Mar 13 '20 at 08:59

answered Mar 13 '20 at 08:51

xyres

20,487
3
56
85

Is it better to use a process based app server to processes more requests? – variable Mar 13 '20 at 10:00
@variable If you're doing CPU bound tasks (aka blocking tasks) like heavy calculations, then, yes, using process workers is better. But there are projects like Celery which help you to run blocking tasks in separate "task queues". So it doesn't matter what kind of app server you're using. But just for doing network bound tasks (like waiting for client requests, or fetching data from third party API) then you don't need workers. – xyres Mar 13 '20 at 10:34
@variable And if by "process based" server you meant a server which creates new process for every request, then no, that is the least scalable way. The most efficient (and common) way is what I described on top of the answer: serve all requests from a single main process and delegate blocking tasks to workers (threads or subprocesses). – xyres Mar 13 '20 at 10:45
By "delegate blocking tasks to workers (threads or subprocesses)" - do you mean Celery? – variable Mar 13 '20 at 10:53
@variable You can yourself maintain a pool of subprocesses in your program and pass them the blocking tasks. For smaller projects this approach is okay. Celery will give you the advantage of easy scalability. You can easily run it on a single server or a cluster of servers depending on your needs. For smaller projects it may be an overkill, though. You can switch to Celery if and when you need it. – xyres Mar 13 '20 at 11:12
If you see the comment by @Chen A. on the question, you will notice that he seems to hint that a worker is a separate process and not a thread. can you confirm? – variable Mar 13 '20 at 11:53
@variable No, "worker" can mean anything - a separate thread or a separate process. Waitress creates threads; see the source code: https://github.com/Pylons/waitress/blob/cbc89bf742ef7cbca17671ec9acd3898491c378f/waitress/task.py#L44 – xyres Mar 13 '20 at 12:36
@xyres Thanks for the rundown. Your explanation is more explicit than what is given in the documentation. – evantkchong Mar 14 '20 at 11:38
@xyres - As I understand, waitress server is thread based; gevent server is sub-process based. Both run off a single main process. We also have an option to add a separate task queue (example celery) which is supported by other thread based and sub process based servers. Are there any other options? Just curious. – variable Jun 09 '20 at 11:11
Assuming there are 4 threads and the database query (io bound) takes 1 minute to complete. Suppose 4 requests arrive on the server and are under process, now when a 5th request arrives then will it get blocked as 4 threads are busy handling requests or will 5th request be accepted since the 4 threads are waiting for IO (db query) – variable Feb 07 '22 at 14:11
1

@variable Since the task is io bound, it can be performed asynchronously by delegating it to the operating system. For io bound tasks, you don't really need multiple threads. You can serve all the requests from a single process/thread. – xyres Feb 07 '22 at 14:25
You are saying that the 5th request will be served even when all 4 threads are busy handling the 4 requests. I think you mean queued? But will this 5th request wait for one of the 4 requests to finish before the code from the 5th request can be executed? Or will the code from 5th request Run even though 4 threads are currently handling the 4 requests? – variable Feb 07 '22 at 15:10
@variable Yes, if the IO is done asynchronously, then the threads are free to process the 5th request even though earlier requests are still pending. That's why it's called async programming because it's non-sequential. If the 5th request doesn't do any long tasks, it can even complete before the earlier requests. – xyres Feb 07 '22 at 17:33
@xyres great answer. Apart from using a queue like Celery for CPU-bound blocking ops, are there any alternates to waitress web server (like gunicorn, uwsgi) that can process multiple such requests truly concurrently? – LucyDrops Oct 13 '22 at 17:15
1

@LucyDrops Yes, Gunicorn and uWSGI support true concurrency using multiple process workers (aka forks). Each worker is a new OS process with its own Python interpreter. – xyres Oct 14 '22 at 13:45

Jeronimo · Answer 2 · 2022-11-26T21:01:11.747

Hint: Those were my comments to the accepted answer and the conversation below, moved to a separate answer for space reasons.

Wait.. The 5th request will stay in the queue until one of the 4 threads is done with their previous handling, and therefore gone back to the pool. One thread will only ever server one request at a time. "IO bound" tasks only help in that the threads waiting for IO will implicitly (e.g. by calling time.sleep) tell the scheduler (python's internal one) that it can pass the GIL along to another thread since there's currently nothing to do, so that the others will get more CPU time for their stuff. On thread level this is fully sequential, which is still concurrent and asynchronous on process level, just not parallel. Just to get some wording staight.

Also, Python threads are "standard" OS threads (like those in C). So they will use all CPU cores and make full use of them. The only thing restricting them is that they need to hold the GIL when calling Python C-API functions, because the whole API in general is not thread-safe. On the other hand, calls to non-Python functions, i.e. functions in C extensions like numpy for example, but also many database APIs, including anything loaded via ctypes, do not hold the GIL while running. Why should they, they are running external C binaries which don't know anything of the Python interpreter running in the parent process. Therefore, such tasks will run truely in parallel when called from a WSGI app hosted by waitress. And if you've got more cores available, turn the thread number up to that amount (threads=X kwarg on waitress.create_server).

How does Waitress handle concurrent tasks?

2 Answers2