TL;DR - No - An HTTP library cannot 100% reliably determine the presence of a response's message-body without knowing the request method, and this does contradict the spec's other point about using the same algorithm for handling requests and responses.
UPDATE: As @JulianReschke mentions, that section of the spec has been rewritten. Below is my own empirical evidence to further support illustrate the point.
§4.4 details a number of factors that determine the true "Message Length", and it lists the type of response (#1) as having higher "precedence" than the value of the Content-length
header (#3). In particular, it mentions that "any response to a HEAD request" is among those which "'MUST NOT' include a message body". So even if the server sends a faulty header, the client should know to ignore it based on the response type. This point seems to be followed pretty strictly (as shown below), so the other point about using the same algorithm for requests and responses, seems to be untrue.
In point of fact, I tried hitting my own Apache server with different HEAD requests and got very different results for Content-length
, vis-a-vis consistency with how it works for requests. Here are the relevant parts of the requests I sent and responses I got
Request: HEAD /
Response: 200 OK
Content-length: 1639
My web root contains index.html
, and 1639
is the size of that file in bytes. This is inconsistent. In this case, it should send a content-length of 0
, since this response itself has no message-body, regardless of the size of the file.
Request: HEAD /someproject
Response: 301 Moved Permanently
/someproject
is a directory, and Apache wants to see a slash at the end of the request URI, so it throws a 301 error. Apparently, because the response is an error, no content-length is sent at all, and this omission is to be interpreted as 0
. This is consistent.
Request: GET /someproject
Response: 301 Moved Permanently
Content-length: 386
Tried again using GET
instead of HEAD
, and now I get the content length of the error page that Apache automatically generated to accompany the 301 header. This is consistent, albeit a little strange in light of how it handles the two HEAD
requests above.
Request: HEAD /someproject
Accept-Encoding: gzip, deflate
Response: 301 Moved Permanently
Content-length: 20
Back to HEAD
, but requesting a gzipped response. This time I get the content-length 20
, which is the size of an empty response after gzip encoding is applied. This would be consistent, but no actual 20-byte gzipped message is sent (presumably because it is a HEAD
request)!
Request: HEAD /someproject/
Response: 200 OK
The directory does contain index.php
, but unlike the first example which returned the filesize of index.html
, here Apache does not want to execute the PHP script to find out the content-length of the actual response, so it treats it as a 0
. This is consistent with the spec, since no message body is sent anyway, but it's exceedingly inconsistent with the first example where it did send a value. The client has no way of knowing whether the index file is HTML or PHP, so it seems strange that there would only sometimes be a value sent.
So, I agree that the spec contradicts itself, and apparently, so does Apache. If you're designing an HTTP library, I'd suggest you make it as robust as possible to handle all kinds of messages that you are likely to encounter, even if they are not totally to spec.