207

I'm building an app with Flask, but I don't know much about WSGI and it's HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?

I do mean concurrent requests, and not requests per second or anything else.

serv-inc
  • 35,772
  • 9
  • 166
  • 188
Carson
  • 17,073
  • 19
  • 66
  • 87

4 Answers4

259

When running the development server - which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.

By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.

It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).

This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.

mathmax
  • 37
  • 1
  • 8
Ryan Artecona
  • 5,953
  • 3
  • 19
  • 18
  • 11
    Gunicorn now supports "real" threads since version 19. See [this](http://docs.gunicorn.org/en/stable/design.html#how-many-threads) and [this](http://docs.gunicorn.org/en/stable/settings.html#threads). – Filipe Correia Oct 05 '16 at 23:48
  • 3
    How does one keep track of which resources get shared (and how) and which are completely separate between threads/processes? For example, how would I handle a situation where I want to share a huge datastructure between several processes handled by Gunicorn and used in the Flask handlers? – jpp1 Jun 20 '18 at 09:51
  • What you are asking @Johsm is like asking how to share data between different processes within the operating system. The answer to that can answer your question, you have to use external storage since processes don't share its memory with other processes. Gunicorn is here only to utilize multiprocessing CPU architectures but not handles those issues. – adkl Jun 25 '19 at 09:57
  • What about Eve? Does this hold for Eve as well? – Eswar Oct 15 '19 at 06:34
  • 7
    the flask development server uses threads by default since v1.0 (https://github.com/pallets/flask/pull/2529) – hychou Oct 24 '19 at 09:19
  • @hychou yes it does. I included an [answer](https://stackoverflow.com/a/50710912/3908170) explaining how its possible – DarkCygnus Jun 10 '20 at 03:28
  • I am facing same issue. For example, if i set -w 10 my app only handles 10 parallel requests at a time. But in production my app maybe needs to handle 1000 requests at a time. How can i handle 1000 requests parallel? gevent worker connection is not a solution for this because it is not parallel. – mg52 Aug 16 '20 at 13:00
  • Is it possible to dynamically increase no of workers in gunicorn while it is still running....through flask api @Ryan Artecona – Vikram Ranabhatt Nov 23 '20 at 10:20
  • Would GIL impact the threads of Flask app?? – Venkataramana Jan 06 '21 at 11:35
75

Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:

app.run(host="your.host", port=4321, threaded=True)

Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:

  • threaded – should the process handle each request in a separate thread?
  • processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.

Something like:

app.run(host="your.host", port=4321, processes=3) #up to 3 processes

More info on the run() method here, and the blog post that led me to find the solution and api references.


Note: on the Flask docs on the run() methods it's indicated that using it in a Production Environment is discouraged because (quote): "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."

However, they do point to their Deployment Options page for the recommended ways to do this when going for production.

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
  • 1
    @Coffee_fan you are right. Even on the latest 1.1.x they discourage that, and instead suggest to check their page on [Deployment Options](https://flask.palletsprojects.com/en/1.1.x/deploying/#deployment) when going for production. Including your valuable observation in the answer :) – DarkCygnus Jun 10 '20 at 03:30
43

Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that's 8 concurrent requests.

Flask doesn't spawn or manage threads or processes. That's the responsability of the WSGI gateway (eg. gunicorn).

jd.
  • 10,678
  • 3
  • 46
  • 55
12

No- you can definitely handle more than that.

Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.

Namely, the CPU can only execute a very limited set of instructions, and it can't execute more than one instruction per clock tick (many instructions even take more than 1 tick).

Therefore, most concurrency we talk about in computer science is software concurrency. In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.

These "things" can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.

Another example is threads, which are units of code inside processes that allow concurrency as well.

The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.

The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.

Good luck!

*instructions are the very basic commands the CPU can run. examples - add two numbers, jump from one instruction to another

user1094786
  • 6,402
  • 7
  • 29
  • 42
  • 1
    Is it gunicorn spawning the threads or Flask? I found no evidence supporting either possibility. – jd. Jun 08 '12 at 10:04
  • 1
    Sure, I understand that about the processes, but the answer says more threads are spawned as needed. That's what I'd like to have confirmation of. – jd. Jul 07 '12 at 05:31
  • 1
    This answer doesn't contain any information about Flask or gunicorn. – jwg Sep 20 '17 at 12:57
  • @jd. From my understanding spawning of threads is based on the worker type (worker_class) you have chosen for gunicorn to run with. See this for more http://docs.gunicorn.org/en/stable/settings.html#worker-processes – darkdefender27 Nov 08 '17 at 06:53
  • 7
    _"deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time"_ This is not correct on modern machines. Most modern CPUs are [pipelined](https://en.wikipedia.org/wiki/Instruction_pipelining) and [superscalar](https://en.wikipedia.org/wiki/Superscalar_processor), where even a single core has multiple execution units and an instruction decoder that converts the "machine code" seen from the software side into the actual hardware micro-ops that are dispatched to the individual execution units. – Michael Geary Aug 10 '19 at 17:23
  • 2
    To clarify, way back in the day, CPUs actually did directly execute the numeric instructions in an executable - the machine code. Every CPU reference had an instruction timing chart showing how many clock cycles each instruction took including any memory references. So you could just add up the timings to know how long any piece of code would take. Modern CPUs are not like that at all. One interesting exception is the [BeagleBone](https://beagleboard.org/black) which has a modern superscalar ARM processor _and_ two old-fashioned "PRU" processors with fixed instruction timing. – Michael Geary Aug 10 '19 at 23:21
  • 2
    And to clarify _that_, when I said "modern" I was using it as a loose shorthand for processors like ARM/Intel/AMD chips - pipelined, superscalar, etc. Of course there are also modern processors that work the old way with fixed timing per instruction, like the BeagleBone PRUs I mentioned and various new microcontrollers. (And now back to Gunicorn!) – Michael Geary Aug 10 '19 at 23:27
  • Even on a single cpu or imaginary fixed timing cpu the multi threaded/processes has a lot of benefits, as a request has wait for the disc, database, network and other stuff, another request can do some independent work. – ego2dot0 Feb 04 '20 at 22:13
  • Just also be aware that Python has a "Global Interpreter Lock" -- except for some packages which carefully release that lock, no matter how many CPUs you have, you'll have only one python instruction running at a time. So again, just be aware of how your multithreader is actually working; you may not be getting what you think. https://wiki.python.org/moin/GlobalInterpreterLock – AlanW Sep 20 '22 at 11:53