I'm trying to get an HTML output from a webpage.
data = response.read()
gives me something like that:
b'\x1f\x8b\x08\x00\x00\x00\x00\...
How can I convert those characters into something like:
"<html><body>.."
?
I'm trying to get an HTML output from a webpage.
data = response.read()
gives me something like that:
b'\x1f\x8b\x08\x00\x00\x00\x00\...
How can I convert those characters into something like:
"<html><body>.."
?
You are dealing with a gzipped response. You can verify this by checking the Content-Encoding
response header, or writing the beginning of that byte sequence to a file and check its type with the file
utility if you're on a Unix-like platform:
>>> data = '\x1f\x8b\x08\x00\x00\x00\x00'
>>> f = open('data.bin', 'w')
>>> f.write(data)
>>> f.close()
$ file data.bin
data.bin: gzip compressed data, last modified: Thu Jun 16 09:32:16 1994
You could decode it yourself, but I suggest ditching urllib
for the requests
module which automatically decompresses it:
import requests
response = requests.get(url)
print response.content