4

I want to read non-ASCII JSON data, in my case it is in Perrsian, from a web page. Here is my code in python-2.7:

jsonObject = json.loads(urllib2.urlopen("https://api.instagram.com/v1/users/"+userId+"/?access_token="+accessToken).read().decode('utf-8').encode('utf-8'));
print jsonObject;

Unfortunately, even after decoding and encoding I got the result like this:

{u'meta': {u'code': 200}, u'data': {u'username': u'*******', u'bio': u'\u0639\u06a9\u0633 \u062f\u0648 \u0646\u0641\u0631\u062a\u0648\u0646 \u0631\u0648 \u0627\u0631\u0633\u0627\u0644 \u06a9\u0646\u06cc\u062f\U0001f48f\U0001f491', u'website': u'', u'profile_picture': u'*****', u'full_name': u'\U0001f451\u0639\u0634\u0642 \u0647\u0627\u06cc \u0627\u06cc\u0631\u0627\u0646\u06cc\U0001f451', u'counts': {u'media': 31, u'followed_by': 12449, u'follows': 0}, u'id': u'*******'}}

What do I need to do to get the characters correctly?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
anahita
  • 55
  • 4

1 Answers1

6

That's normal. Container contents use Python-syntax representations that are ASCII friendly.

Print the individual string values, and you'll see that the actual value is still there:

>>> print jsonObject['data']['bio']
عکس دو نفرتون رو ارسال کنید
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • thanks!!!! but why should I encode it again when I want to write it in csv?`writer = csv.writer(open("hi.csv", 'w')); writer.writerow([jsonObject[u'data'][u'bio'].encode('utf-8')]);` ! I encoded it once!!!! – anahita Dec 13 '15 at 17:51
  • 1
    @anahita: the Python 2 version of the `csv` module is known to not handle Unicode very well. The documentation page for the module has some work-arounds for that. Also see [Read and Write CSV files including unicode with Python 2.7](http://stackoverflow.com/q/17245415) – Martijn Pieters Dec 13 '15 at 17:56