Garbled text returned while opening a URL in Python 2.7

Question

I would like to open a StackExchange API (search endpoint) URL and parse the result [0]. The documentation says that all results are in JSON format [1]. I open up this URL in my web browser and the results are absolutely fine [2]. However, when I try opening it up using a Python program it returns encoded text which I am unable to parse. Here's a snip

á¬ôŸ?ÍøäÅ€ˆËç?bçÞIË
¡ëf)j´ñ‚TF8¯KÚpr®´Ö©iUizEÚD +¦¯÷tgNÈÃ‘.G¾LPUç?Ñ‘Ù~]ŒäÖÂ9Ÿð1£µ$JNóa?Z&Ÿtž'³Ðà#Í°¬õÅj5ŸE÷*æJî”Ï>íÓé’çÔqQI’†ksS™¾þEíqÝýly

My program to open a URL is as follows. What am I doing particularly wrong?

''' Opens a URL and returns the result '''
def open_url(query):
    request = urllib2.Request(query)
    response = urllib2.urlopen(request)
    text = response.read()
    #results = json.loads(text)
    print text


title = openRawResource, AssetManager.AssetInputStream throws IOException on read of larger files


page1_query = stackoverflow_search_endpoint % (1,urllib.quote_plus(title),access_token,key)

[0] https://api.stackexchange.com/2.1/search/advanced?page=1&pagesize=100&order=desc&sort=relevance&q=openRawResource%2C+AssetManager.AssetInputStream+throws+IOException+on+read+of+larger+files&site=stackoverflow&access_token=******&key=******

[1] https://api.stackexchange.com/docs

[2] http://hastebin.com/qoxaxahaxa.sm

Soultion

I found the solution. Here's how you would do it.

request = urllib2.Request(query)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
    buf = StringIO( response.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()
    result = json.loads(data)

Can not post the complete output as it is too huge.Many Thanks to Evert and Kristaps for pointing out about decompression and setting headers on the request. In addition, another similar question one would want to look into [3].

[3] Does python urllib2 automatically uncompress gzip data fetched from webpage?

@kristaps No - I believe I should but I do not quite know the procedure. Can you help me out? — Dexter, Oct 01 '12 at 11:32
Apart from setting headers, you should probably check the header info you get back as well. See the urllib2 documentation. For example, `response.info()` has some meta data, including header information. You can set header information on the Request() object using `request.add_header(, )`. See examples at the bottom of http://docs.python.org/library/urllib2.html. — , Oct 01 '12 at 11:50

score 2 · Answer 1 · answered Oct 01 '12 at 11:27

2

The next paragraph of the documentation says:

Additionally, all API responses are compressed. The Content-Encoding header is always set, but some proxies will strip this out. The proper way to decode API responses can be found here.

Your output does look like it may be compressed. Browsers automatically decompress data (depending on the Content-Encoding), so you would need to look at the header and do the same: results = json.loads(zlib.decompress(text)) or something similar.

Do check the here link as well.

answered Oct 01 '12 at 11:27

Thanks for the response. t looks like I need to add some headers to the original API call. zlib.compress does not work directly and throws me a "incorrect header check" error. I am not quite an expert here. Any help to achieve my goal should be great! Thanks. – Dexter Oct 01 '12 at 11:35
I take it you mean `zlib.decompress`. There may be some extra information before; can you show the first few lines or 100 characters or so that you get back? – Oct 01 '12 at 11:42
I did use zlib.decompress. I can not paste those characters here. – Dexter Oct 01 '12 at 11:47

score 1 · Accepted Answer · edited May 23 '17 at 11:48

I found the solution. Here's how you would do it.

request = urllib2.Request(query)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
    buf = StringIO( response.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()
    result = json.loads(data)

Can not post the complete output as it is too huge.Many Thanks to Evert and Kristaps for pointing out about decompression and setting headers on the request. In addition, another similar question one would want to look into [1].

[1] Does python urllib2 automatically uncompress gzip data fetched from webpage?

Garbled text returned while opening a URL in Python 2.7

2 Answers2