0

I am using urllib.request. If I set Accept-Encoding to gizp, deflate the data returned is a compressed stream, and a traffic savings of 60% to 80%. Is there an option to automatically decompress the data or must I handle it myself? If the latter, what are the appropriate tools to use?

Old Geezer
  • 14,854
  • 31
  • 111
  • 198

1 Answers1

1

I recommend switching from urllib to requests. It automatically handles gziped data.

An example:

>>> r = requests.get('https://api.github.com/events')
>>> r.text
u'[{"id":"2978193412","type":"PushEvent","actor":{"id":13182197,"login":"ClothoWong","gravatar_id":"","url":"https://api.github.com/users/...

(Clipped for brevity)

Above, you see some nice, pretty JSON, but it actually downloaded using the GZIP encoding:

>>> r.raw.getheaders()['Content-Encoding']
'gzip'

(You can also confirm that the endpoint responds with gzip encoding via your favorite browser developer tool.)

requests is, in my opinion, a superior option to urllib, anyway. You'll end up with far less, simpler code to do the same things.

jpmc26
  • 28,463
  • 14
  • 94
  • 146
  • Thanks for the tip. But I am too invested in `urllib` in the current project. Will consider `requests` for new work. The answer to my question is: `zlib.decompress(gizppedContent, 16+zlib.MAX_WBITS)`. – Old Geezer Jul 16 '15 at 04:51
  • @OldGeezer You can switch now. `requests` can be used for any new work sending requests (including the currently incomplete task) and old work can be left alone until it needs to be changed. That said, `requests` is so much more intuitive and simple to use, you *really* should weigh the cost of continuing to use `urllib` vs. the ramp up cost of `requests`. I think you'll find that you'll very quickly be spending more time trying to figure out `urllib` than you would have learning `requests`. Being "too heavily invested" in an improper abstraction usually ends up costing more pretty quickly. – jpmc26 Jul 16 '15 at 06:06