I would like to open a StackExchange API (search endpoint) URL and parse the result [0]. The documentation says that all results are in JSON format [1]. I open up this URL in my web browser and the results are absolutely fine [2]. However, when I try opening it up using a Python program it returns encoded text which I am unable to parse. Here's a snip
á¬ôŸ?ÍøäÅ€ˆËç?bçÞIË
¡ëf)j´ñ‚TF8¯KÚpr®´Ö©iUizEÚD +¦¯÷tgNÈÑ.G¾LPUç?Ñ‘Ù~]ŒäÖÂ9Ÿð1£µ$JNóa?Z&Ÿtž'³Ðà#Ͱ¬õÅj5ŸE÷*æJî”Ï>íÓé’çÔqQI’†ksS™¾þEíqÝýly
My program to open a URL is as follows. What am I doing particularly wrong?
''' Opens a URL and returns the result '''
def open_url(query):
request = urllib2.Request(query)
response = urllib2.urlopen(request)
text = response.read()
#results = json.loads(text)
print text
title = openRawResource, AssetManager.AssetInputStream throws IOException on read of larger files
page1_query = stackoverflow_search_endpoint % (1,urllib.quote_plus(title),access_token,key)
[1] https://api.stackexchange.com/docs
[2] http://hastebin.com/qoxaxahaxa.sm
Soultion
I found the solution. Here's how you would do it.
request = urllib2.Request(query)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
buf = StringIO( response.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
result = json.loads(data)
Can not post the complete output as it is too huge.Many Thanks to Evert and Kristaps for pointing out about decompression and setting headers on the request. In addition, another similar question one would want to look into [3].
[3] Does python urllib2 automatically uncompress gzip data fetched from webpage?