6

I'm trying to stream data from a flask/gunicorn server:

while (True):
    result = json.dumps(tweetQueue.get())
    yield result

However, 30 seconds into the stream, gunicorn times out my connection and stops the stream. How can I make the timeout such that publishing new data to the stream from the server will restart the timeout so the stream will not be terminated?

Thanks!

akn320
  • 573
  • 4
  • 13
  • 1
    Perhaps this `gunicorn` parameter is relevant? http://docs.gunicorn.org/en/develop/configure.html#timeout – jaynp Mar 30 '14 at 03:23
  • Yes, I considered that - it seemed really inelegant to just set a large timeout value, and I couldn't see a way to turn off the timeout entirely. Besides, ideally I would still have a useful timeout in case the connection really is lost, but the timeout would be reset when new data is published. – akn320 Mar 30 '14 at 04:37

3 Answers3

10

I am answering my own question after doing some more research.

gunicorn server:app -k gevent

This uses asynchronous workers, which have the benefit of using Connection: keep-alive when serving requests. This allows the request to be served indefinitely.

akn320
  • 573
  • 4
  • 13
  • This answer sounds good, but unfortunately it didn't work for me. I am using Python 2.7 and gunicorn 0.14.5 and gevent 0.13.6. Which versions are you using? – personal_cloud Sep 14 '17 at 16:44
  • 1
    Unlike the comment above suggests this method still works. But remember to install `gevent` before trying this. For reference, first install greenlet using `pip install greenlet` and then GEvent using `pip install gevent` – B8vrede Apr 25 '18 at 18:56
0

Consider using the built-in BaseHTTPServer instead of gunicorn. The following example launches 100 handler threads on the same port, with each handler started through BaseHTTPServer. It streams fine, supports multiple connections on 1 port, and generally runs 2X faster than gunicorn too. And you can wrap your socket in SSL if you want that too.

import time, threading, socket, SocketServer, BaseHTTPServer

class Handler(BaseHTTPServer.BaseHTTPRequestHandler):

    def do_GET(self):
        if self.path != '/':
            self.send_error(404, "Object not found")
            return
        self.send_response(200)
        self.send_header('Content-type', 'text/html; charset=utf-8')
        self.end_headers()

        # serve up an infinite stream
        i = 0
        while True:
            self.wfile.write("%i " % i)
            time.sleep(0.1)
            i += 1

# Create ONE socket.
addr = ('', 8000)
sock = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(addr)
sock.listen(5)

# Launch 100 listener threads.
class Thread(threading.Thread):
    def __init__(self, i):
        threading.Thread.__init__(self)
        self.i = i
        self.daemon = True
        self.start()
    def run(self):
        httpd = BaseHTTPServer.HTTPServer(addr, Handler, False)

        # Prevent the HTTP server from re-binding every handler.
        # https://stackoverflow.com/questions/46210672/
        httpd.socket = sock
        httpd.server_bind = self.server_close = lambda self: None

        httpd.serve_forever()
[Thread(i) for i in range(100)]
time.sleep(9e9)

If you insist on using gunicorn anyway, remember to put it (and all its related packages: wsgi, gevent, flask) in a virtualenv to avoid conflicts with other software.

personal_cloud
  • 3,943
  • 3
  • 28
  • 38
  • @Hejazzmann. BaseHTTPServer starts *and* runs faster, and is more secure due to it being a better implementation with fewer bugs. gunicorn corrupts [paths](https://stackoverflow.com/questions/45574849/gunicorn-is-corrupting-sys-path), [headers](https://www.cvedetails.com/cve/CVE-2018-1000164/), and has multiple package dependencies that will cause bugs in larger projects when people just `pip` things because they think that "virtualenv is irrelevant". Take your pick, but cut the FUD please. – personal_cloud Oct 23 '19 at 01:05
  • You don't know what you're talking about. BaseHTTPServer is a base class for a webserver construction. Not meant for production as it lacks security/perf features. Even the higher level SimpleHTTPServer is not recommended: "This module defines two classes for implementing HTTP servers (Web servers). Usually, this module isn’t used directly, but is used as a basis for building functioning Web servers. See the SimpleHTTPServer and CGIHTTPServer modules." (...) "Warning SimpleHTTPServer is not recommended for production. It only implements basic security checks." (python doc) – Hejazzman Oct 09 '20 at 07:13
0

Gunicorn processes are sending "messages" to master process to let it know they are still alive (see https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/workertmp.py#L40). However this is not done during response serving (for example see https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/sync.py#L160) so if it takes longer then timeout the master process kills the worker.

Martin Indra
  • 126
  • 1
  • 4
  • To keep your answer *alive* you should not only provide links but put some content in your answer so it will be here if the link dies. – storaged Jan 09 '18 at 17:18