3

I wanted to check how Transfer-Encoding "chunked" works, i.e. play with some data streaming. But then I found an example in Flask and read that Flask does streaming its own way, because of inability of WSGI to handle chunked encoding. Flask uses generators to generate stuff dynamically.

I am not very experienced with HTTP, so please bear with me here. I just thought that in HTTP 1.1 in order to stream data you HAVE TO USE TRANSFER-ENCODING CHUNKED. But apparently NOT! Is that so? Or do I misunderstand something here?

How is this Flask's example able to stream data without using the Transfer-Encoding header? It just writes new data to the connection and that's it. Curl is able to receive this properly.

So, another question arises: if streaming is possible without this Transfer-Encoding then why bother using it?

What am I missing here? (I am sure I am missing something :-))

The example:

#!/usr/bin/python3
import time
from datetime import datetime
from flask import Flask, Response
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

@app.route('/time')
def doyouhavethetime():
    def generate():
        while True:
            yield "{}\n".format(datetime.now().isoformat())
            time.sleep(1)
    return Response(generate(), mimetype='text/plain')

if __name__ == "__main__":
    app.run()

Curl's output:

curl -v localhost:5000/time                                                                                                                                                                           ─╯
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 5000 (#0)
> GET /time HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.58.0
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Type: text/plain; charset=utf-8
< Connection: close
< Server: Werkzeug/0.16.0 Python/3.6.9
< Date: Sun, 05 Jan 2020 22:21:25 GMT
< 
2020-01-05T23:21:25.018220
2020-01-05T23:21:26.020679
2020-01-05T23:21:27.021989
2020-01-05T23:21:28.023392
2020-01-05T23:21:29.024697
2020-01-05T23:21:30.025996
2020-01-05T23:21:31.027361

Wireshark's capture:

GET /time HTTP/1.1
Host: localhost:5000
User-Agent: curl/7.58.0
Accept: */*

HTTP/1.0 200 OK
Content-Type: text/plain; charset=utf-8
Connection: close
Server: Werkzeug/0.16.0 Python/3.6.9
Date: Sat, 04 Jan 2020 22:13:15 GMT

2020-01-04T23:13:15.567580
2020-01-04T23:13:16.570090
2020-01-04T23:13:17.571464
2020-01-04T23:13:18.572737
2020-01-04T23:13:19.574004
2020-01-04T23:13:20.575320
YotKay
  • 1,127
  • 1
  • 9
  • 25
  • tl;dr of dupe: The downside to doing it this way is that there's no way to signal the end of the content except for closing the connection, so it has to use `Connection: close` instead of the more efficient `Connection: keep-alive`. – Joseph Sible-Reinstate Monica Jan 05 '20 at 22:29
  • See also [What are the consequences of not including a content-length header in a server response?](https://stackoverflow.com/q/27228343/7509065) and [What happens in HTTP response to a GET request without Content-Length or Transfer-encoding?](https://stackoverflow.com/q/30339894/7509065) – Joseph Sible-Reinstate Monica Jan 05 '20 at 22:31
  • 1
    Ok, now I think I understand. If Content-Length or Transfer-Encoding is not sent, then the client cannot determine where the message ends and it can only rely on server closing the tcp/ip connection, which is inefficient, because subsequent request would have to reopen the connection, which costs. Thank you guys! – YotKay Jan 05 '20 at 22:43

0 Answers0