111

What is the current state of affairs when it comes to whether to do

Transfer-Encoding: gzip

or a

Content-Encoding: gzip

when I want to allow clients with e.g. limited bandwidth to signal their willingness to accept a compressed response and the server have the final say whether or not to compress.

The latter is what e.g. Apache's mod_deflate and IIS do, if you let it take care of compression. Depending on the size of the content to be compressed, it will do the additional Transfer-Encoding: chunked.

It will also include a Vary: Accept-Encoding, which already hints at the problem. Content-Encoding seems to be part of the entity, so changing the Content-Encoding amounts to a change of the entity, i.e. a different Accept-Encoding header means e.g. a cache cannot use its cached version of the otherwise identical entity.

Is there a definite answer on this that I have missed (and that's not buried inside a message in a long thread in some apache newsgroup)?

My current impression is:

  • Transfer-Encoding would in fact be the right way to do what is mostly done with Content-Encoding by existing server and client implentations
  • Content-Encoding, because of its semantic implications, carries a couple of issues (what should the server do to the ETag when it transparently compresses a response?)
  • The reason is chicken'n'egg: Browsers don't support it because servers don't because browsers don't

So I am assuming the right way would be a Transfer-Encoding: gzip (or, if I additionally chunk the body, it would become Transfer-Encoding: gzip, chunked). And no reason to touch Vary or ETag or any other header in that case as it's a transport-level thing.

For now I don't care too much about the 'hop-by-hop'-ness of Transfer-Encoding, something that others seem to be concerned about first and foremost, because proxies might uncompress and forward uncompressed to the client. However, proxies might just as well forward it as-is (compressed), if the original request has the proper Accept-Encoding header, which in case of all browsers that I know is a given.

Btw, this issue is at least a decade old, see e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=68517 .

Any clarification on this will be appreciated. Both in terms of what is considered standards-compliant and what is considered practical. For example, HTTP client libraries only supporting transparent "Content-Encoding" would be an argument against practicality.

Evgeniy Berezovsky
  • 18,571
  • 13
  • 82
  • 156
  • 2
    Related: http://stackapps.com/questions/916/why-content-encoding-gzip-rather-than-transfer-encoding-gzip – Jo Liss Sep 29 '13 at 21:40
  • Just ran into this. Curl on PHP 5.3 doesn't understand `Transfer-Encoding:gzip`, although command line curl does. To be on the safe side, send both, unless you're combining chunked and gzip. – Seva Alekseyev Dec 23 '16 at 01:22
  • 1
    @SevaAlekseyev sending both would be very wrong -- clients might try to decompress twice – Joshua Wise Jul 11 '18 at 17:49
  • This is something that's bugged me forever, too ([question I asked](https://stackoverflow.com/questions/28656068/compressing-request-body-with-python-requests))… per one of the answers to the question that @JoLiss cited, there's a [perfectly logical, semantically coherent, and standards-compliant](https://stackapps.com/questions/916/why-content-encoding-gzip-rather-than-transfer-encoding-gzip/3655#3655) way to compress request/response bodies… and basically no clients/servers use or support it. ‍ – Dan Lenski Feb 19 '20 at 17:21

2 Answers2

37

Quoting Roy T. Fielding, one of the authors of RFC 2616:

changing content-encoding on the fly in an inconsistent manner (neither "never" nor "always) makes it impossible for later requests regarding that content (e.g., PUT or conditional GET) to be handled correctly. This is, of course, why performing on-the-fly content-encoding is a stupid idea, and why I added Transfer-Encoding to HTTP as the proper way to do on-the-fly encoding without changing the resource.

Source: https://issues.apache.org/bugzilla/show_bug.cgi?id=39727#c31

In other words: Don't do on-the-fly Content-Encoding, use Transfer-Encoding instead!

Edit: That is, unless you want to serve gzipped content to clients that only understand Content-Encoding. Which, unfortunately, seems to be most of them. But be aware that you leave the realms of the spec and might run into issues such as the one mentioned by Fielding as well as others, e.g. when caching proxies are involved.

Evgeniy Berezovsky
  • 18,571
  • 13
  • 82
  • 156
  • 3
    So if I get it right: 1. Content-encoding refers to the content encoding on the server in the abstract, i.e. the content will consistently be served in specified encoding by the server. 2. Transfer-encoding refers to the encoding the server decided to use to deliver it to the user agent in this instance, i.e. in this response. Just making sure I'm not misinterpreting your answer. – dot slash hack Aug 12 '14 at 22:11
  • 34
    @KemHeyndels About right. Put another way: **According to the specs, Transfer-Encoding is a pure _transport layer_ detail**, i.e. an intermediate proxy is free to undo e.g. gzip compression at that level, **whereas Content-Encoding is a _business layer_ property**, which a proxy would not be allowed to change, in addition to other ramifications (ETags etc). **According to reality** however, TE is not normally used for compression, and many servers/clients don't even support it out of the box, whereas **CE is used more or less the way TE was intented to be used**: as a _transport layer_ detail. – Evgeniy Berezovsky Aug 12 '14 at 22:52
  • 1
    So we're obliged by reality to ignore Roy T. Fielding's advice? – dot slash hack Aug 12 '14 at 23:26
  • 11
    @KemHeyndels You are obliged by idealism to go out and first add TE support to all open-source HTTP client/server implementations. Then get yourself employed at every company that has closed-source HTTP implementations (I think that's Microsoft only anyway) and add the feature there as well. After that, reality and the spec will coincide. ;) (And HTTP 2.0 will have been released, making the problem go away anyway) – Evgeniy Berezovsky Aug 13 '14 at 00:09
  • 1
    So from the webserver's standpoint it's best to always deliver gzip content - factoring in initial confusion about http deflate and raw deflate algorithm - and always set transfer-encoding to identity, just to signal to other developers that your server supports transfer-encoding. (I'm writing a webserver in node.js using just (berkeley) sockets, tls and zlib now; That's why I'm asking. I need to support the largest amount of user-agents.) – dot slash hack Aug 13 '14 at 08:47
  • 10
    Indicating that you support Transfer-Encoding still does not make clear that you support gzip over Transfer-Encoding, so that doesn't buy you anything. **Indication is done the other way around**: Any client who can do gzip via Transfer-Encoding will let the server know by setting `TE: gzip`. And then your server should go the Transfer-Encoding route. If the client only says `Accept-Encoding: gzip`, you have to do it the `Content-Encoding` way. If the client specifies neither in its request, the server mustn't gzip at all. – Evgeniy Berezovsky Aug 13 '14 at 23:25
  • When using chunked transfer encoding, if the content itself has carriage returns and newlines, and the content has not been compressed or encoded in any form, wouldn't this affect the chunked parsing? – CMCDragonkai Jul 30 '15 at 08:36
  • 2
    @CMCDragonkai Not sure why you ask a question about "chunked" transfer encoding as a comment to an answer on "transfer" vs. "content" encoding, but anyway: "chunked" chunks a stream of arbitrary bytes into a stream of "byte chunks" (of arbitrary sizes). The keyword here is "arbitrary bytes". Chunker and dechunker neither know nor care about the contents of bytes which could be any binary format like a jpeg, which may well contain the byte sequences that mean CR / NL if in a text. If you want to know exactly how it works, ask an SO question. – Evgeniy Berezovsky Jul 30 '15 at 22:37
  • 1
    In 2018, is Transfer-Encoding still under-supported? – Joshua Wise Jul 11 '18 at 19:45
37

The correct usage, as defined in RFC 2616 and actually implemented in the wild, is for the client to send an Accept-Encoding request header (the client may specify multiple encodings). The server may then, and only then, encode the response according to the client's supported encodings (if the file data is not already stored in that encoding), indicate in the Content-Encoding response header which encoding is being used. The client can then read data off of the socket based on the Transfer-Encoding (ie, chunked) and then decode it based on the Content-Encoding (ie: gzip).

So, in your case, the client would send an Accept-Encoding: gzip request header, and then the server may decide to compress (if not already) and send a Content-Encoding: gzip and optionally Transfer-Encoding: chunked response header.

And yes, the Transfer-Encoding header can be used in requests, but only for HTTP 1.1, which requires that both client and server implementations support the chunked encoding in both directions.

ETag uniquely identifies the resource data on the server, not the data actually being transmitted. If a given URL resource changes its ETag value, it means the server-side data for that resource has changed.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 16
    [content-coding is a characteristic of the entity identified by the Request-URI](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11) In other words: **Different `Content-Encoding` requires different `ETag`** This is btw what the [mod_deflate bug](https://issues.apache.org/bugzilla/show_bug.cgi?id=39727) I refer to in my answer is all about. Makes me wonder why this application-level detail is in the HTTP standard in the first place. When using `Transfer-Encoding` however, a transport level setting, there's no need to change the `ETag`. Except nobody has implemented Transfer-Enc. – Evgeniy Berezovsky May 08 '13 at 01:39
  • 3
    Content-Encoding is not for the "on the fly" encoding. RFC 2616 says "The Transfer-Encoding ... differs from the content-coding in that the transfer-coding is a property of the message, not of the entity."(https://tools.ietf.org/html/rfc2616#section-14.41), and "The content-coding is a characteristic of the entity identified by the Request-URI. Typically, the entity-body is stored with this encoding"(https://tools.ietf.org/html/rfc2616#section-14.11). So I vote down. – Robert Aug 04 '16 at 14:03
  • What I described is what is "*actually implemented in the wild*", regardless of `Content-Encoding` vs `Transfer-Encoding`. Yes, gzip *should* be a property of the transfer of a resource, if done on-the-fly. On the other hand, if the resource is stored compressed on the server, it *should* be a property of the content of the resource instead, if sent as-is. But what *should be* and what *actually is* are not always the same thing. – Remy Lebeau Aug 04 '16 at 15:14