2

I'm using Python 2.7.6 to parse a JSON file but I'm getting an error and I'm not really sure why. This is my first time dealing with Python so this might be a very basic issue but I've looked a bit on stack overflow and can't seem to figure out what's wrong.

Here is my python code to parse the data:

import json
from pprint import pprint 

with open ('test.json') as data_file:
    data = json.load(data_file);
pprint(data)  

and here is my JSON file:

{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "hckr9Hf8BUHcXfOSDv9eJA", "review_id": "K6EEJo0I8AbwGWvwe5SJYQ", "stars": 5, "date": "2013-05-05", "text": "This place is fantastic. they have a restaurant inside the grocery store. very good food.", "type": "review", "business_id": "uPezkdNi_x_SwWlf_2rcMw"}
{"votes": {"funny": 0, "useful": 0, "cool": 1}, "user_id": "PK3TxomYLwZuOXonmYqjNw", "review_id": "5ivy-tczAQ4WYrmVF6YoKg", "stars": 5, "date": "2013-08-11", "text": "This is going to be a place we go back to many times!", "type": "review", "business_id": "UB2j_EV3CIM_E4LcpadKMQ"}

This is the error I get when I parse the JSON:

File "./parse.py", line 6, in <module>
    data = json.load(data_file);
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 290, in load
    **kw)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 318 - 600)

Oddly enough if I just have the first line of JSON I can successfully parse the data. Any help understanding the error and helping me fix it would be greatly appreciated.

user2604504
  • 697
  • 2
  • 14
  • 29
  • You want `for line in data_file` - or, enclosing brackets `[]` and commas at the end of your lines. – g.d.d.c Mar 07 '14 at 06:01
  • At the top level, a valid JSON file *must* be either a *single* object `{...}` or a *single* array `[...]`; you can't have two objects back-to-back. – Adam Rosenfield Mar 07 '14 at 06:09
  • The data is not valid JSON, but it **is** valid JSONL - a related format that just puts a separate valid JSON document on each line of the file. `line 2 column 1` in the error message is a red flag for this issue. – Karl Knechtel Jul 03 '22 at 22:39

1 Answers1

3

That's not valid json - you can't just stick two hashes next to each other like that... try something like so. You'll notice I put a comma between and put the entire set of hashes in an array.

[
   {
      "stars" : 5,
      "date" : "2013-05-05",
      "review_id" : "K6EEJo0I8AbwGWvwe5SJYQ",
      "text" : "This place is fantastic. they have a restaurant inside the grocery store. very good food.",
      "user_id" : "hckr9Hf8BUHcXfOSDv9eJA",
      "type" : "review",
      "votes" : {
         "funny" : 0,
         "cool" : 0,
         "useful" : 0
      },
      "business_id" : "uPezkdNi_x_SwWlf_2rcMw"
   },
   {
      "stars" : 5,
      "date" : "2013-08-11",
      "review_id" : "5ivy-tczAQ4WYrmVF6YoKg",
      "text" : "This is going to be a place we go back to many times!",
      "user_id" : "PK3TxomYLwZuOXonmYqjNw",
      "type" : "review",
      "votes" : {
         "funny" : 0,
         "cool" : 1,
         "useful" : 0
      },
      "business_id" : "UB2j_EV3CIM_E4LcpadKMQ"
   }
]
erik258
  • 14,701
  • 2
  • 25
  • 31
  • interesting because I got this from a Yelp data set. So they are giving me invalid JSON? – user2604504 Mar 07 '14 at 06:14
  • The 2 JSON hashes are independently valid, so perhaps you're just concatenating them together in a way they didn't expect? Or maybe they're meant to be valid independently, not together. You could always parse them one at a time ( line by line as @g.d.d.c suggested ) if you feel like it! – erik258 Mar 07 '14 at 06:31