0

I am trying to parse this json file: http://pastebin.com/VcVR0ue0

While using these modules

from pprint import pprint
import codecs
import json

file = 'Desktop10000_760_CurtSacks.json'

I've tried these methods

a)

data = data = json.load(open(file))

b)

data = json.load(codecs.open(file, encoding='utf_8_sig'))

In both cases the output has a u inserted in front of each key-value:

{u'document_tone': {u'tone_categories': [{u'category_id': u'emotion_tone',
                                          u'category_name': u'Emotion Tone',
                                          u'tones': [{u'score': 0.111838,
                                                      u'tone_id': u'anger',
                                                      u'tone_name': u'Anger'},
                                                     {u'score': 0.159831,
                                                      u'tone_id': u'disgust',
                                                      u'tone_name': u'Disgust'},
                                                     {u'score': 0.17082,
                                                      u'tone_id': u'fear',
                                                      u'tone_name': u'Fear'},
                                                     {u'score': 0.507748,
                                                      u'tone_id': u'joy',
                                                      u'tone_name': u'Joy'},
                                                     {u'score': 0.520722,
                                                      u'tone_id': u'sadness',
                                                      u'tone_name': u'Sadness'}]},

How do I read the file correctly?

samkhan13
  • 3,315
  • 2
  • 33
  • 54
  • thank you. now i feel this question wasn't an important one at all. but i'm glad i learned something basic. – samkhan13 Mar 11 '17 at 03:22

2 Answers2

1

It looks like everything's being parsed properly.

Python's syntax for a unicode string is:

u'Here is the string.'

So the Python equivalent of this JSON:

{"foo": "bar"}

is this:

{u'foo': u'bar'}

If you just print out the Python representation of the data, you'll see the Python syntax.

user94559
  • 59,196
  • 6
  • 103
  • 103
0

The 'u' indicates a python unicode string - this is normal. The json library by nature returns unicode strings, so it looks like your data is being parsed properly.

If for whatever reason you don't want unicode strings in your JSON you can use yaml

import yaml
data = yaml.safe_load(open(file))
print( data )

So you'd get

{'key':'item'}

Instead of

{u'key':'item'}

Although I don't see a reason not to use unicode, as for most purposes it won't affect much. (see Python str vs unicode types)

Community
  • 1
  • 1
Bob Person
  • 71
  • 2
  • 6