Decoding urllib.request response

Question

I'm getting this response when I open this url:

r = Request(r'http://airdates.tv/')
h = urlopen(r).readline()
print(h)

Response:

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xed\xbdkv\xdbH\x96.\xfa\xbbj\x14Q\xaeuJ\xce\xee4E\x82\xa4(9m\xe7\xd2\xd3VZ\xaf2e\xab2k\xf5\xc2\n'

What encoding is this? Is there a way to decode it based on the standard library?
Thank you in advance for any insight on this matter!

PS: It seems to be gzip.

mhawke · Accepted Answer · 2016-10-23T09:54:38.987

9

It's gzip compressed HTML, as you suspected.

Rather than use urllib use requests which will decompress the response for you:

import requests

r = requests.get('http://airdates.tv/')
print(r.text)

You can install it with pip install requests, and never look back.

If you really must restrict yourself to the standard library, then decompress it with the gzip module:

import gzip
import urllib2
from cStringIO import StringIO

f = urllib2.urlopen('http://airdates.tv/')

# how to determine the content encoding
content_encoding = f.headers.get('Content-Encoding')
#print(content_encoding)

# how to decompress gzip data with Python 3
if content_encoding == 'gzip':
    response = gzip.decompress(f.read())

# decompress with Python 2
if content_encoding == 'gzip':   
    gz = gzip.GzipFile(fileobj=StringIO(f.read())
    response = gz.read()

edited Oct 23 '16 at 09:54

answered Oct 23 '16 at 08:54

mhawke

84,695
9
117
138

I see, requests does handle it without breaking a sweat. I would still prefer to get it done with a standard library. I think this answer may lead me to such a solution: http://stackoverflow.com/questions/6123223/howto-uncompress-gzipped-data-in-a-byte-array – TCN Oct 23 '16 at 09:01
got it: `zlib.decompress(gz_data, 16+zlib.MAX_WBITS)` – TCN Oct 23 '16 at 09:07
oh I posted the solution I am using, but your answer is more complete! Retrieving the content-encoding from the page is very useful! Thank you. – TCN Oct 23 '16 at 09:26
1

@zvone: yes, in Python 3, but not Python 2. OP is probably using Python 3 so this is a good point and I've edited the answer accordingly. – mhawke Oct 23 '16 at 09:52
Yes, I am using Python 3. – TCN Oct 23 '16 at 09:58

score 0 · Answer 2 · answered Oct 23 '16 at 09:22

mhawke's solution (using requests instead of urllib) works perfectly and in most cases should be preferred. That said, I was looking for a solution that does not require installing 3rd party libraries (hence my choice of urllib over requests).

I found a solution using standard libraries:

import zlib
from urllib.request import Request, urlopen

r = Request(r'http://airdates.tv/')
h = urlopen(r).read()
decomp_gzip = zlib.decompress(h, 16+zlib.MAX_WBITS)
print(decomp_gzip)

Which yields the following response:

b'<!DOCTYPE html>\n (continues...)'

Imagine the server enables and disables compression and you have no knowledge if it. Will this work for a non-compressed stream or will this throw an error? The beauty of the "requests" approach is, it handles it automatically, as I understand. just curious — CaptainCrunch, Nov 28 '20 at 20:28

Decoding urllib.request response

2 Answers2