0

I'm trying to turn a json file in to a pandas dataframe using the code below, and I'm getting the errors below. This json file is from the yelp academic dataset. In case anyone is familiar with it.

The code comes from the post below:

JSON to pandas DataFrame

If anyone can spot the issue and let me know how to fix it or suggest an alternative I'd be grateful. I always have a hard time working with json.

Code:
import pandas as pd
import json

# reading the JSON data using json.load()
file = 'dataset/business.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-b7e90dfd32af> in <module>()
      5 file = 'dataset/business.json'
      6 with open(file) as train_file:
----> 7     dict_train = json.load(train_file)
      8 
      9 # converting json dataset from dictionary to dataframe

/Users/scotsditch/anaconda/lib/python2.7/json/__init__.pyc in load(fp, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    289         parse_float=parse_float, parse_int=parse_int,
    290         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook,
--> 291         **kw)
    292 
    293 

/Users/scotsditch/anaconda/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    337             parse_int is None and parse_float is None and
    338             parse_constant is None and object_pairs_hook is None and not kw):
--> 339         return _default_decoder.decode(s)
    340     if cls is None:
    341         cls = JSONDecoder

/Users/scotsditch/anaconda/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
    365         end = _w(s, end).end()
    366         if end != len(s):
--> 367             raise ValueError(errmsg("Extra data", s, end, len(s)))
    368         return obj
    369 

ValueError: Extra data: line 2 column 1 - line 156640 column 1 (char 731 - 132272455)
user3476463
  • 3,967
  • 22
  • 57
  • 117
  • show your sample json file – BENY Oct 18 '17 at 01:19
  • @Wen Thank you for getting back to me so quickly. What is a good way to provide a sample of the file? With r I would just use dput. – user3476463 Oct 18 '17 at 02:04
  • You can just paste here, python is more flexible for data source – BENY Oct 18 '17 at 02:04
  • @Wen The business.json file is pretty large. I tried throwing it in a text editor just to sample some but it locked it up. Is there a way to sample some rows or get a description with python? I'm having trouble reading it in, in the first place. – user3476463 Oct 18 '17 at 02:45
  • Maybe post some sample structure of your data , or specific which part give back your error ? – BENY Oct 18 '17 at 03:06

0 Answers0