Make a non-blocking request with requests when running Flask with Gunicorn and Gevent

Question

My Flask application will receive a request, do some processing, and then make a request to a slow external endpoint that takes 5 seconds to respond. It looks like running Gunicorn with Gevent will allow it to handle many of these slow requests at the same time. How can I modify the example below so that the view is non-blocking?

import requests

@app.route('/do', methods = ['POST'])
def do():
    result = requests.get('slow api')
    return result.content

gunicorn server:app -k gevent -w 4

What do you expect would happen here? You can't return anything to the client if you haven't received it yet — Wayne Werner, Sep 28 '16 at 20:14
I was expecting to make it async so when it's waiting for the super slow api the cpu power can be used to handle other incoming requests that can potentially be going to the other path. (Since I assume this application will receive lots of other different incoming requests) — JLTChiu, Sep 28 '16 at 20:16
That doesn't mean what you think it means. And Gunicorn *should* be handling this for you, you could test to make sure just by adding a `time.sleep(30)` in there, I think. It's called the reactor pattern, but Gunicorn allows the client to connect, and then passes off the request to a worker. When the worker finishes, it returns the data from the worker and then puts it back in the pool. I'm not sure if it spins up a new worker if all the existing ones are busy, though. — Wayne Werner, Sep 28 '16 at 20:24
I am still learning this, but I expect running Gunicorn should be something like `gunicorn server:app -k gevent -w 4` but I am really not sure. — JLTChiu, Sep 28 '16 at 20:27
@WayneWerner, do you mean that with the current code I posted above, when a specific request is waiting for the slow api to response, it will just use the cpu power to process other incoming requests to the application server? — JLTChiu, Sep 28 '16 at 20:29
Well the CPU for sure - that's your OS that's going to handle that when the underlying socket does a `.read` the OS is going to say, "Oh sweet, this process is blocked, lemme do something else". — Wayne Werner, Sep 28 '16 at 20:34
@WayneWerner All I want is to achieve as much rps as possible, I had experience that slow blocking IO will significantly reduce the rps my application server can process, and I want to avoid that (maybe with gevent?) — JLTChiu, Sep 28 '16 at 20:48
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/124470/discussion-between-wayne-werner-and-jltchiu). — Wayne Werner, Sep 28 '16 at 21:00

score 19 · Accepted Answer · answered Oct 09 '16 at 07:27

19

If you're deploying your Flask application with gunicorn, it is already non-blocking. If a client is waiting on a response from one of your views, another client can make a request to the same view without a problem. There will be multiple workers to process multiple requests concurrently. No need to change your code for this to work. This also goes for pretty much every Flask deployment option.

answered Oct 09 '16 at 07:27

sytech

29,298
3
45
86

5

In this case the OP is talking about making another blocking network call from inside his view. Which is different from the scenario in your answer. – e4c5 Oct 10 '16 at 10:36
1

OP asked "How can I modify the example below so that the view is non-blocking?" The view is already nonblocking. Of course the `requests.get` blocks, but this action happens in a manner such that another client can still access the same view, which I believe was OP's main concern. You could make the api call nonblocking, too, but that doesn't help because you won't be able to return anything to the client until it completes, anyhow. – sytech Oct 10 '16 at 12:21
1

Really? perhaps I misunderstood the question. My understanding was that he wanted requests to be non blocking – e4c5 Oct 10 '16 at 12:22
I think because OP thought that the server must finish serving one request before it can serve another client. There's some clarifications that were made in the question comments. – sytech Oct 10 '16 at 12:24
When working with slow API synchronously it may be good idea to increase the gunicorn default timeout e.g.: `gunicorn ... --timeout 60`. Default value is just 30 seconds. https://stackoverflow.com/questions/6816215/gunicorn-nginx-timeout-problem – fmalina Jun 05 '19 at 20:27

score 8 · Answer 2 · answered Oct 07 '16 at 06:56

First a bit of background, A blocking socket is the default kind of socket, once you start reading your app or thread does not regain control until data is actually read, or you are disconnected. This is how python-requests, operates by default. There is a spin off called grequests which provides non blocking reads.

The major mechanical difference is that send, recv, connect and accept can return without having done anything. You have (of course) a number of choices. You can check return code and error codes and generally drive yourself crazy. If you don’t believe me, try it sometime

Source: https://docs.python.org/2/howto/sockets.html

It also goes on to say:

There’s no question that the fastest sockets code uses non-blocking sockets and select to multiplex them. You can put together something that will saturate a LAN connection without putting any strain on the CPU. The trouble is that an app written this way can’t do much of anything else - it needs to be ready to shuffle bytes around at all times.

Assuming that your app is actually supposed to do something more than that, threading is the optimal solution

But do you want to add a whole lot of complexity to your view by having it spawn it's own threads. Particularly when gunicorn as async workers?

The asynchronous workers available are based on Greenlets (via Eventlet and Gevent). Greenlets are an implementation of cooperative multi-threading for Python. In general, an application should be able to make use of these worker classes with no changes.

and

Some examples of behavior requiring asynchronous workers: Applications making long blocking calls (Ie, external web services)

So to cut a long story short, don't change anything! Just let it be. If you are making any changes at all, let it be to introduce caching. Consider using Cache-control an extension recommended by python-requests developers.

jerry · Answer 3 · 2016-10-09T15:53:56.847

You can use grequests. It allows other greenlets to run while the request is made. It is compatible with the requests library and returns a requests.Response object. The usage is as follows:

import grequests

@app.route('/do', methods = ['POST'])
def do():
    result = grequests.map([grequests.get('slow api')])
    return result[0].content

Edit: I've added a test and saw that the time didn't improve with grequests since gunicorn's gevent worker already performs monkey-patching when it is initialized: https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L65

Make a non-blocking request with requests when running Flask with Gunicorn and Gevent

3 Answers3

Linked