1

I am trying to load in python the file business.json from yelp academic data available for their academic challenge, see below (https://www.yelp.com/dataset/documentation/json) My Goal is to extract all restaurant and their ID to then find the one restaurant I am interested for. Once I have this restaurant id, I want to load review.json and extract all reviews for that given restaurant. Sadly I am stuck at the initial stage of landing the .json

this is what business.json looks like:

{
    // string, 22 character unique string business id
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg",

    // string, the business's name
    "name": "Garaje",

    // string, the neighborhood's name
    "neighborhood": "SoMa",

    // string, the full address of the business
    "address": "475 3rd St",

    // string, the city
    "city": "San Francisco",

    // string, 2 character state code, if applicable
    "state": "CA",

    // string, the postal code
    "postal code": "94107",

    // float, latitude
    "latitude": 37.7817529521,

    // float, longitude
    "longitude": -122.39612197,

    // float, star rating, rounded to half-stars
    "stars": 4.5,

    // interger, number of reviews
    "review_count": 1198,

    // integer, 0 or 1 for closed or open, respectively
    "is_open": 1,

    // object, business attributes to values. note: some attribute values might be objects
    "attributes": {
        "RestaurantsTakeOut": true,
        "BusinessParking": {
            "garage": false,
            "street": true,
            "validated": false,
            "lot": false,
            "valet": false
        },
    },

    // an array of strings of business categories
    "categories": [
        "Mexican",
        "Burgers",
        "Gastropubs"
    ],

    // an object of key day to value hours, hours are using a 24hr clock
    "hours": {
        "Monday": "10:00-21:00",
        "Tuesday": "10:00-21:00",
        "Friday": "10:00-21:00",
        "Wednesday": "10:00-21:00",
        "Thursday": "10:00-21:00",
        "Sunday": "11:00-18:00",
        "Saturday": "10:00-21:00"
    }
}

When I try to import business.json with the following code:

import json

jsonBus = json.loads(open('business.json').read())
for item in jsonBus:
    name = item.get("Name")
    businessID = item.get("business_id")

I get the following error:

runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')
Traceback (most recent call last):

  File "<ipython-input-46-68ba9d6458bc>", line 1, in <module>
    runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
    execfile(filename, namespace)

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/Nico/Google Drive/Python/yelp/yelp_academic.py", line 3, in <module>
    jsonBus = json.loads(open('business.json').read())

  File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)

  File "/anaconda3/lib/python3.6/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)

JSONDecodeError: Extra data

Does anyone know why such errors appears?

I am also open to any smarter way to proceed!

Best,

Nico

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Nico
  • 11
  • 1

2 Answers2

1

If your json file is exactly the same as you mentioned, it should not have comments (a.k.a. // string, 22 character unique string business id) as they are not a part of the standard.

Please see a related post here: Can comments be used in JSON?

Kirill Pavlov
  • 124
  • 1
  • 6
  • 1
    This is a copy paste from yelp website, I think it's not in the json – Nico Oct 08 '17 at 02:30
  • I'm using the same dataset and am getting the same error OP is. The *only* `//` in the file I replaced with "aka" ([it was in a place name](https://i.stack.imgur.com/N7cmi.jpg)). The JSON looks legit otherwise, no comments in there. [Here's a screenshot of the JSON in SublimeText](https://i.stack.imgur.com/A2cja.jpg). The way OP shows it, is just from that link. It's not actually how the data is laid out in the files. – BruceWayne Oct 27 '17 at 23:02
  • By using `json_data = json.loads('business.json')` I get almost the same error, it's `raise JSONDecodeError("Expecting value", s, err.value) from None \n json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)` – BruceWayne Oct 27 '17 at 23:09
0

I think this works - I'm working with the same dataset and had similar errors. Saw a comment here that seems to work.

import json

js = [json.loads(line) for line in open('business.json')]
for item in js:
    name = item.get("name")
    businessID = item.get("business_id")

However, I'm still wondering why json.loads() doesn't work. The file itself looks fine.

BruceWayne
  • 22,923
  • 15
  • 65
  • 110
  • 1
    `json.loads()` loads a string, not a file and expects the file to be one entire JSON object. This file instead contains one JSON object on each line – OneCricketeer Oct 28 '17 at 00:11
  • @cricket_007 - Ooohhhhh okay - I'm new to json (obviously :P ) and didn't realize that. Thanks for your note!! – BruceWayne Oct 28 '17 at 00:17