1

I'm getting this response when I open this url:

r = Request(r'http://airdates.tv/')
h = urlopen(r).readline()
print(h)

Response:

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xed\xbdkv\xdbH\x96.\xfa\xbbj\x14Q\xaeuJ\xce\xee4E\x82\xa4(9m\xe7\xd2\xd3VZ\xaf2e\xab2k\xf5\xc2\n'

What encoding is this? Is there a way to decode it based on the standard library?
Thank you in advance for any insight on this matter!

PS: It seems to be gzip.

TCN
  • 1,571
  • 1
  • 26
  • 46

2 Answers2

9

It's gzip compressed HTML, as you suspected.

Rather than use urllib use requests which will decompress the response for you:

import requests

r = requests.get('http://airdates.tv/')
print(r.text)

You can install it with pip install requests, and never look back.


If you really must restrict yourself to the standard library, then decompress it with the gzip module:

import gzip
import urllib2
from cStringIO import StringIO

f = urllib2.urlopen('http://airdates.tv/')

# how to determine the content encoding
content_encoding = f.headers.get('Content-Encoding')
#print(content_encoding)

# how to decompress gzip data with Python 3
if content_encoding == 'gzip':
    response = gzip.decompress(f.read())

# decompress with Python 2
if content_encoding == 'gzip':   
    gz = gzip.GzipFile(fileobj=StringIO(f.read())
    response = gz.read()
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • I see, requests does handle it without breaking a sweat. I would still prefer to get it done with a standard library. I think this answer may lead me to such a solution: http://stackoverflow.com/questions/6123223/howto-uncompress-gzipped-data-in-a-byte-array – TCN Oct 23 '16 at 09:01
  • got it: `zlib.decompress(gz_data, 16+zlib.MAX_WBITS)` – TCN Oct 23 '16 at 09:07
  • oh I posted the solution I am using, but your answer is more complete! Retrieving the content-encoding from the page is very useful! Thank you. – TCN Oct 23 '16 at 09:26
  • 1
    @zvone: yes, in Python 3, but not Python 2. OP is probably using Python 3 so this is a good point and I've edited the answer accordingly. – mhawke Oct 23 '16 at 09:52
  • Yes, I am using Python 3. – TCN Oct 23 '16 at 09:58
0

mhawke's solution (using requests instead of urllib) works perfectly and in most cases should be preferred. That said, I was looking for a solution that does not require installing 3rd party libraries (hence my choice of urllib over requests).

I found a solution using standard libraries:

import zlib
from urllib.request import Request, urlopen

r = Request(r'http://airdates.tv/')
h = urlopen(r).read()
decomp_gzip = zlib.decompress(h, 16+zlib.MAX_WBITS)
print(decomp_gzip)

Which yields the following response:

b'<!DOCTYPE html>\n (continues...)'
TCN
  • 1,571
  • 1
  • 26
  • 46
  • Imagine the server enables and disables compression and you have no knowledge if it. Will this work for a non-compressed stream or will this throw an error? The beauty of the "requests" approach is, it handles it automatically, as I understand. just curious – CaptainCrunch Nov 28 '20 at 20:28