I'm trying to turn a json file in to a pandas dataframe using the code below, and I'm getting the errors below. This json file is from the yelp academic dataset. In case anyone is familiar with it.
The code comes from the post below:
If anyone can spot the issue and let me know how to fix it or suggest an alternative I'd be grateful. I always have a hard time working with json.
Code:
import pandas as pd
import json
# reading the JSON data using json.load()
file = 'dataset/business.json'
with open(file) as train_file:
dict_train = json.load(train_file)
# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-b7e90dfd32af> in <module>()
5 file = 'dataset/business.json'
6 with open(file) as train_file:
----> 7 dict_train = json.load(train_file)
8
9 # converting json dataset from dictionary to dataframe
/Users/scotsditch/anaconda/lib/python2.7/json/__init__.pyc in load(fp, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
289 parse_float=parse_float, parse_int=parse_int,
290 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook,
--> 291 **kw)
292
293
/Users/scotsditch/anaconda/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
337 parse_int is None and parse_float is None and
338 parse_constant is None and object_pairs_hook is None and not kw):
--> 339 return _default_decoder.decode(s)
340 if cls is None:
341 cls = JSONDecoder
/Users/scotsditch/anaconda/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
365 end = _w(s, end).end()
366 if end != len(s):
--> 367 raise ValueError(errmsg("Extra data", s, end, len(s)))
368 return obj
369
ValueError: Extra data: line 2 column 1 - line 156640 column 1 (char 731 - 132272455)