3

During a HTTP request/response, how do you determine the presence and length of the message-body, and in particular, does whatever algorithm is required make use of the request-method as input?

I'm interested in both the request and the response, though this question asks about the requests's message-body's existance and length, and it looks like it should be determinable on headers alone (i.e., using the method specified in §4.4 of RFC2616) That's good for the request, but it is similarly possible to do the same for the response?

RFC2616 §4.4 seems to indicate that the same algorithm should get used for the request and the response (as it's talking in generic terms of message-body), so this would seem to indicate it can be done generically.

However, the HEAD method sticks out like a sore thumb here: Content-Length is sent back as part of the headers during the response, but no body will be.

Is HEAD special, and the only special method? Or can an extension method have a similar behavior, and thus I actually do need to know — for each method — whether that method requires special handling. (And thus, extension methods can't be used unless pre-negotiated outside HTTP.)

Community
  • 1
  • 1
Thanatos
  • 42,585
  • 14
  • 91
  • 146
  • Not sure enough to list this as an answer, but I read through the request method list recently and I do think HEAD is special here. – Josh from Qaribou May 12 '14 at 00:31
  • Given that a HEAD's response will contain a Content-Length but no body, why do you not think that conflicts with the method outlined in §4.4? – Thanatos May 12 '14 at 00:41

2 Answers2

1

TL;DR - No - An HTTP library cannot 100% reliably determine the presence of a response's message-body without knowing the request method, and this does contradict the spec's other point about using the same algorithm for handling requests and responses.

UPDATE: As @JulianReschke mentions, that section of the spec has been rewritten. Below is my own empirical evidence to further support illustrate the point.

§4.4 details a number of factors that determine the true "Message Length", and it lists the type of response (#1) as having higher "precedence" than the value of the Content-length header (#3). In particular, it mentions that "any response to a HEAD request" is among those which "'MUST NOT' include a message body". So even if the server sends a faulty header, the client should know to ignore it based on the response type. This point seems to be followed pretty strictly (as shown below), so the other point about using the same algorithm for requests and responses, seems to be untrue.

In point of fact, I tried hitting my own Apache server with different HEAD requests and got very different results for Content-length, vis-a-vis consistency with how it works for requests. Here are the relevant parts of the requests I sent and responses I got


Request: HEAD /

Response: 200 OK

Content-length: 1639

My web root contains index.html, and 1639 is the size of that file in bytes. This is inconsistent. In this case, it should send a content-length of 0, since this response itself has no message-body, regardless of the size of the file.


Request: HEAD /someproject

Response: 301 Moved Permanently

/someproject is a directory, and Apache wants to see a slash at the end of the request URI, so it throws a 301 error. Apparently, because the response is an error, no content-length is sent at all, and this omission is to be interpreted as 0. This is consistent.


Request: GET /someproject

Response: 301 Moved Permanently

Content-length: 386

Tried again using GET instead of HEAD, and now I get the content length of the error page that Apache automatically generated to accompany the 301 header. This is consistent, albeit a little strange in light of how it handles the two HEAD requests above.


Request: HEAD /someproject

Accept-Encoding: gzip, deflate

Response: 301 Moved Permanently

Content-length: 20

Back to HEAD, but requesting a gzipped response. This time I get the content-length 20, which is the size of an empty response after gzip encoding is applied. This would be consistent, but no actual 20-byte gzipped message is sent (presumably because it is a HEAD request)!


Request: HEAD /someproject/

Response: 200 OK

The directory does contain index.php, but unlike the first example which returned the filesize of index.html, here Apache does not want to execute the PHP script to find out the content-length of the actual response, so it treats it as a 0. This is consistent with the spec, since no message body is sent anyway, but it's exceedingly inconsistent with the first example where it did send a value. The client has no way of knowing whether the index file is HTML or PHP, so it seems strange that there would only sometimes be a value sent.


So, I agree that the spec contradicts itself, and apparently, so does Apache. If you're designing an HTTP library, I'd suggest you make it as robust as possible to handle all kinds of messages that you are likely to encounter, even if they are not totally to spec.

pieman72
  • 836
  • 8
  • 14
0

RFC 2616 is obsolete. The description in the new spec has been rewritten completely. See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-26.html#message.body.length.

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98