5

I'm working with an HTTP request tool (similar to cURL) and having an issue with the server response. Either that or my understanding of the RFC for HTTP 1.1 and chunked data.

What I'm seeing is chunked data should be in this format:

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0\r\n
\r\n

what I'm actually seeing is the following:

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0

In other words, the few servers I've tested with send no more data after the 0.. not CRLF, much less CRLFCRLF.

How are we supposed to know it's the end of the chunked data without the proper format of the chunked tags? Timeouts happen looking for the CRLFs after the 0, and that's no sufficient.

bvstone
  • 587
  • 2
  • 6
  • 17
  • This is very strange. What sort of Server is responding with such error? What do you have in the Server header? If you had at least one CRLF after the 0 you could say something, here you could still receive some digits, so it's clearly an error. Or maybe ther's an error in your parsing code? Do you have the tcpdump or a wireshark capture? – regilero Nov 23 '15 at 19:16
  • No, I don't have those things. This is a socket application that I have written and used for many years. But, I'm running into issues with the chunked data. Anything else seems to work fine except the 0 at the end (ie, no CRLF after it). I read bytes one at a time when reading the chunked length, looking for CRLF so I know it's the end. I then convert that value from hex to dec, and read that amount from the socket. It's when I read the last 0 and go to read one more byte it times out doing a select() on the socket waiting for it to be ready. – bvstone Nov 23 '15 at 19:38
  • if you are sure of the incoming data, then the server is faulty, refers to http://stackoverflow.com/a/2127723/550618 – regilero Nov 23 '15 at 19:39
  • Ok, I did some testing.. I removed the select() from the requests when reading one byte at a time to retrieve chunked length and things worked. Interesting... – bvstone Nov 23 '15 at 19:43
  • yes, so you found an awful issue in your socket based code, good luck :-) – regilero Nov 23 '15 at 19:48
  • @bvstone did you find the reason why the select() call is interfering with your code? I am also having the same problems its exactly what you just said. It times out after reading the zero. What happens is the select() just timeouts and it doesnt know that there is still some data available for read. – kuchi Dec 01 '17 at 18:11
  • @kuchi - Nope. But some servers seem to be working just fine (PayPal for one seems to require HTTP 1.1 and it works just fine.) I haven't tested this in a while so it's hard to say. – bvstone Dec 01 '17 at 22:33

2 Answers2

1

Yes, it violates standard. But we want to be compatible with all possible http servers and clients, so we have to understand a way how it can be violated.

Chunked is used often in a way of content streaming over http 1.1 protocol. Standard ask to end content with additional CRLF. So we can see the following pseudo code:

def stream(endpoint)
  Socket.open(endpoint) do |socket|
    sleep 10

    more_data do |data|
      print data.length.to_s(16)
      print data
      print "CRLF"
    end
  end

  print "CRLF"
end

But the right code is the following:

def stream(endpoint)
  Socket.open(endpoint) do |socket|
    sleep 10

    more_data do |data|
      print data.length.to_s(16)
      print data
      print "CRLF"
    end
  end

ensure
  print "CRLF"
end

It means that after input socket interruption of any other exception wrong version of method won't be able to print additional "CRLF" to output socket.

How are we supposed to know it's the end of the chunked data without the proper format of the chunked tags? Timeouts happen looking for the CRLFs after the 0, and that's no sufficient.

Many implementations ignores this violation because they don't need to know the size of content. They just tries to receive as much data as possible before socket will be closed.

puchu
  • 3,294
  • 6
  • 38
  • 62
-1

Use Content-Length, definitely whenever I know it; for file download, checking the filesize is insignificant in terms of resources. For chunked transfer we do not scan the message body for a CRLF pair. It first reads the specified number of bytes, and then reads two more bytes to confirm that they are CR and LF. If they're not, the message body is ill-formed, and either the size was specified improperly or the data was otherwise corrupted.

For more information read RCF, which says

A server using chunked transfer-coding in a response MUST NOT use the trailer for any header fields unless at least one of the following is true:

a)the request included a TE header field that indicates "trailers" is acceptable in the transfer-coding of the response, as described in section 14.39; or,

b)the server is the origin server for the response, the trailer fields consist entirely of optional metadata, and the recipient could use the message (in a manner acceptable to the origin server) without receiving this metadata. In other words, the origin server is willing to accept the possibility that the trailer fields might be silently discarded along the path to the client.

Way to Determine Message Body Length:

If header has Transfer-Encoding and the chunked transfer is final encoding, then message body length is determined by reading and decoding the chunked data until the transfer coding indicates the data is complete.

If header has Transfer-Encoding and the chunked transfer is not final encoding, then message body length is determined by reading the connection until it is closed by the server.

If header has Transfer-Encoding in request and the chunked transfer is not final encoding, then message body length cannot be determined reliably; the server MUST respond with the 400 (Bad Request) status code and then close the connection.

If a message is received with both a Transfer-Encoding and Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request response splitting and ought to be handled as an error. A sender MUST remove the received Content-Length field prior to forwarding such a message downstream.

Vineet1982
  • 7,730
  • 4
  • 32
  • 67
  • 1
    I do not see how this is a response to the question. And Content-lenght is not the way of computing the end of a chunked transfer, for sure. – regilero Nov 23 '15 at 19:17
  • @regilero then how would you determine the length of document – Vineet1982 Nov 23 '15 at 19:21
  • By definition chunked transfer means you have no clue on the size of the document -- you wait for the last chunk--, and if you have a content-length it's onlt an *help*, some server will reject requests using both as it could be used for http smuggling attacks (by using different sizes with 2 indicators in a protocol which is higly sensible to size of the message - like in http tunneling) – regilero Nov 23 '15 at 19:24
  • And the *trailer* information is only the data between the size hexa digits and the CRLF, using trailers or not using trailers says nothing about the fact that you MUST have CRLF at the end of the chunk size line. – regilero Nov 23 '15 at 19:25
  • 1
    As stated, Content-length is not an option. This is HTTP 1.1 using Chunked data. A lot different than using Content length. See https://en.wikipedia.org/wiki/Chunked_transfer_encoding. My issue is that according to the docs, the last 0 should have CRLFCRLF after it. In the tests I've done with 3-4 different servers, they only return 0, no CRLF after them. Now, either apache and IIS servers are doing this wrong, or there's something I'm missing with the last chunk of data. :) – bvstone Nov 23 '15 at 19:33
  • @regilero I have provided the ways to determine message length with the help of contents length and transfer encoding. Content-length helps to read the determine length of document – Vineet1982 Nov 23 '15 at 19:50