179

I wrote a bash script that gets output from a website using curl and does a bunch of string manipulation on the html output. The problem is when I run it against a site that is returning its output gzipped. Going to the site in a browser works fine.

When I run curl by hand, I get gzipped output:

$ curl "http://example.com"

Here's the header from that particular site:

HTTP/1.1 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
X-Powered-By: PHP/5.2.17
Last-Modified: Sat, 03 Dec 2011 00:07:57 GMT
ETag: "6c38e1154f32dbd9ba211db8ad189b27"
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: must-revalidate
Content-Encoding: gzip
Content-Length: 7796
Date: Sat, 03 Dec 2011 00:46:22 GMT
X-Varnish: 1509870407 1509810501
Age: 504
Via: 1.1 varnish
Connection: keep-alive
X-Cache-Svr: p2137050.pubip.peer1.net
X-Cache: HIT
X-Cache-Hits: 425

I know the returned data is gzipped, because this returns html, as expected:

$ curl "http://example.com" | gunzip

I don't want to pipe the output through gunzip, because the script works as-is on other sites, and piping through gzip would break that functionality.

What I've tried

  1. changing the user-agent (I tried the same string my browser sends, "Mozilla/4.0", etc)
  2. man curl
  3. google search
  4. searching stackoverflow

Everything came up empty

Any ideas?

BryanH
  • 5,826
  • 3
  • 34
  • 47
  • 1
    For me, the problem was that cURL wasn't able to decompress Brotli (`curl 7.54.0 (x86_64-apple-darwin17.0) libcurl/7.54.0 LibreSSL/2.0.20 zlib/1.2.11 nghttp2/1.24.0`) - solved it by removing `br` from `Accept-Encoding`. see https://stackoverflow.com/questions/18983719/is-there-any-way-to-get-curl-to-decompress-a-response-without-sending-the-accept – The Onin Sep 04 '18 at 11:48
  • The behavior has been supposedly changed. Try `curl -sSv https://stackoverflow.com/ |& rg -i 'gzip|accept'` alone, and with `--compressed`. Unless `curl` passes `Accept-Encoding`, the server doesn't gzip the response. – x-yuri Jan 05 '21 at 05:04

2 Answers2

342

curl will automatically decompress the response if you set the --compressed flag:

curl --compressed "http://example.com"

--compressed (HTTP) Request a compressed response using one of the algorithms libcurl supports, and save the uncompressed document. If this option is used and the server sends an unsupported encoding, curl will report an error.

gzip is most likely supported, but you can check this by running curl -V and looking for libz somewhere in the "Features" line:

$ curl -V
...
Protocols: ...
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz 

Note that it's really the website in question that is at fault here. If curl did not pass an Accept-Encoding: gzip request header, the server should not have sent a compressed response.

Martin
  • 37,119
  • 15
  • 73
  • 82
  • 31
    This would appear to be a curl bug, because it should trigger its decoding based on the response, not on what it requested (given that it does support gzip). To quote HTTP 1.1: "If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding." But it does go on to say that servers SHOULD in that case not encode the content, hmm, go figure. – George Lund Feb 21 '13 at 16:37
  • actually on my version works --comp --compress --compressed – Radu Toader Jun 13 '16 at 13:14
  • 6
    this also sets the request header: "Accept-Encoding: deflate, gzip" thats great since if the server serves gzip and no gzip, you just need --compressed and not add the accept encoding header yourself – mbert Feb 27 '17 at 15:47
  • help my QA with this solution in 1 minute ! thank you ! That said, my application is actually sending gzip response with Content-Encoding: gzip. Browsers and modern tools (e.g. httpie) automatically handles it. I guess curl just need a "hint" – Faraway Apr 18 '18 at 23:15
  • Surprisingly, setting `Accept-Encoding: deflate, gzip` is not enough - even if the server returns a gzip response with `Content-Encoding: gzip`, curl won't automatically ungzip it. The `--compressed` flag is required. – rjh Jan 08 '19 at 20:06
  • (and remove --raw if it's there). – jonincanada Apr 15 '19 at 20:56
  • A funny thing is that the [man page](https://man7.org/linux/man-pages/man1/curl.1.html) say, "Headers are not modified." Although they clearly are. – x-yuri Jan 05 '21 at 04:56
  • > But it does go on to say that servers SHOULD in that case not encode the content, hmm, go figure I think the part of the standard you're referencing ("An Accept-Encoding header field with a combined field-value that is empty implies that the user agent does not want any content-coding in response.") is for an empty `Accept-Encoding: ` header. (vs. the header not being present at all.) (Unless you mean the next sentence, "If an Accept-Encoding header field is present in a request […] listed as acceptable", I think that just applies to other cases, but I agree it's confusingly worded.) – Thanatos Apr 26 '22 at 19:03
1

In the relevant bug report Raw compressed output when not using --compressed but server returns gzip data #2836 the developers says:

The server shouldn't send content-encoding: gzip without the client having signaled that it is acceptable.

Besides, when you don't use --compressed with curl, you tell the command line tool you rather store the exact stream (compressed or not). I don't see a curl bug here...

So if the server could be sending gzipped content, use --compressed to let curl decompress it automatically.

cweiske
  • 30,033
  • 14
  • 133
  • 194
  • That is not always reasonable or possible. If a server you don't own is configured incorrectly, it is unlikely you can get them to fix it. Coding defensively is a good approach to this problem. See the [comment by George Lund](https://stackoverflow.com/a/8365089/41688) for yet another reason why _Everything is Broken_ ™. – BryanH Dec 09 '21 at 18:59
  • 1
    I hate to contradict him of all people, since I figure he knows HTTP pretty freaking well, but… "The server shouldn't send content-encoding: gzip without the client having signaled that it is acceptable." Thing is, `curl` *does* signal that it is acceptable, by omitting the `Accept-Encoding` header. The standard says, in that case, "If no Accept-Encoding field is in the request, any content-coding is considered acceptable by the user agent." (To signal that no encoding is acceptable, I think, would require either `Accept-Encoding: identity` or `*;q=0, or an empty header.) – Thanatos Apr 26 '22 at 18:59