I am using urllib.request
. If I set Accept-Encoding to gizp, deflate
the data returned is a compressed stream, and a traffic savings of 60% to 80%. Is there an option to automatically decompress the data or must I handle it myself? If the latter, what are the appropriate tools to use?
Asked
Active
Viewed 283 times
0

Old Geezer
- 14,854
- 31
- 111
- 198
1 Answers
1
I recommend switching from urllib
to requests
. It automatically handles gziped data.
An example:
>>> r = requests.get('https://api.github.com/events')
>>> r.text
u'[{"id":"2978193412","type":"PushEvent","actor":{"id":13182197,"login":"ClothoWong","gravatar_id":"","url":"https://api.github.com/users/...
(Clipped for brevity)
Above, you see some nice, pretty JSON, but it actually downloaded using the GZIP encoding:
>>> r.raw.getheaders()['Content-Encoding']
'gzip'
(You can also confirm that the endpoint responds with gzip encoding via your favorite browser developer tool.)
requests
is, in my opinion, a superior option to urllib
, anyway. You'll end up with far less, simpler code to do the same things.

jpmc26
- 28,463
- 14
- 94
- 146
-
Thanks for the tip. But I am too invested in `urllib` in the current project. Will consider `requests` for new work. The answer to my question is: `zlib.decompress(gizppedContent, 16+zlib.MAX_WBITS)`. – Old Geezer Jul 16 '15 at 04:51
-
@OldGeezer You can switch now. `requests` can be used for any new work sending requests (including the currently incomplete task) and old work can be left alone until it needs to be changed. That said, `requests` is so much more intuitive and simple to use, you *really* should weigh the cost of continuing to use `urllib` vs. the ramp up cost of `requests`. I think you'll find that you'll very quickly be spending more time trying to figure out `urllib` than you would have learning `requests`. Being "too heavily invested" in an improper abstraction usually ends up costing more pretty quickly. – jpmc26 Jul 16 '15 at 06:06