As an exercise I built a little script that query Google Suggest JSON API. The code is quite simple:
query = 'a'
url = "http://clients1.google.co.jp/complete/search?hl=ja&q=%s&json=t" %query
response = urllib.urlopen(url)
result = json.load(response)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0: invalid start byte
If I try to read()
the response object, this is what I've got:
'["a",["amazon","ana","au","apple","adobe","alc","\x83A\x83}\x83]\x83\x93","\x83A\x83\x81\x83u\x83\x8d","\x83A\x83X\x83N\x83\x8b","\x83A\x83\x8b\x83N"],["","","","","","","","","",""]]'
So it seams that the error is raised when python try to decode the string. This only happens with google.co.jp and the Japanese language. I tried the same code with different contry/languages and I do not get the same issue: when I try to deserialize the object everything works OK.
- I checked the response headers for and they always specify utf-8 as the response encoding.
- I checked the JSON string with an online parser (http://json.parser.online.fr/) and again all seams OK
Any ideas to solve this problem? What make the JSON load()
function choke?
Thanks in advance.