23

I have a Json file as follows. It's a list of dicts.

[{"city": "ab", "trips": 4, "date": "2014-01-25", "value": 4.7, "price": 1.1, "request_date": "2014-06-17", "medium": "iPhone", "%price": 15.4, "type": true, "Weekly_pct": 46.2, "avg_dist": 3.67, "avg_price": 5.0}, {"city": "bc", "trips": 0, "date": "2014-01-29", "value": 5.0, "price": 1.0, "request_date": "2014-05-05", "medium": "Android", "%price": 0.0, "type": false, "weekly_pct": 50.0, "avg_dist": 8.26, "avg_price": 5.0}.....]

When I read this using this:

data=pd.read_json('dataset.json')

I get the following error:

ValueError: Expected object or value

I tried this too:

from ast import literal_eval

with open('dataset.json') as f:
    data = literal_eval(f.read())

df = pd.DataFrame(data)

It gives the following error:

ValueError: malformed string

Edit:

Even Json.loads doesn't work. Tried this:

import json
data=json.loads('dataset.json')

ValueError: No JSON object could be decoded

The Json file is 13.5MB but it seems to have huge amounts of data.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Baktaawar
  • 7,086
  • 24
  • 81
  • 149

6 Answers6

17

I think you can use modul json for reading file.json and then DataFrame constructor:

import pandas as pd
import json

with open('file.json') as f:
   data = json.load(f)
print data
[{u'city': u'ab', u'medium': u'iPhone', u'request_date': u'2014-06-17', u'price': 1.1, u'Weekly_pct': 46.2, u'value': 4.7, u'%price': 15.4, u'avg_price': 5.0, u'date': u'2014-01-25', u'avg_dist': 3.67, u'type': True, u'trips': 4}, {u'city': u'bc', u'medium': u'Android', u'request_date': u'2014-05-05', u'price': 1.0, u'weekly_pct': 50.0, u'value': 5.0, u'%price': 0.0, u'avg_price': 5.0, u'date': u'2014-01-29', u'avg_dist': 8.26, u'type': False, u'trips': 0}]

print pd.DataFrame(data)

   %price  Weekly_pct  avg_dist  avg_price city        date   medium  price  \
0    15.4        46.2      3.67        5.0   ab  2014-01-25   iPhone    1.1   
1     0.0         NaN      8.26        5.0   bc  2014-01-29  Android    1.0   

  request_date  trips   type  value  weekly_pct  
0   2014-06-17      4   True    4.7         NaN  
1   2014-05-05      0  False    5.0        50.0  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I think the example OP gave works, and that the error is buried somewhere in the large file... – IanS Apr 25 '16 at 10:28
  • 1
    Hmmm, I get first error (`ValueError: Expected object or value`) and second error (`ValueError: malformed string`) too with sample. But my solution works very well. – jezrael Apr 25 '16 at 10:30
  • ok. I just did what @jezrael suggested. And it worked. However my column order is different. Like city should be first column but its coming in different order as he is getting too. Any idea how to get same order of column names? – Baktaawar Apr 25 '16 at 10:33
  • 1
    @jezrael any idea why `read_json` would fail? And why your solution works? Even `json.loads` (with an s) fails... – IanS Apr 25 '16 at 10:34
  • Yes. Json.loads fails, read_json fails and even ujson.load fails. I had tried all these three and it failed but json.load works – Baktaawar Apr 25 '16 at 10:36
  • 2
    I think it fails, because list of dictionaries in `json` file. It is valid `json`, but seems `read_json` doesnt support these type of json. – jezrael Apr 25 '16 at 10:40
  • Sorry, there is something wrong? Why you unaccept? Thanks. – jezrael Apr 26 '16 at 12:14
10

I had the same error. Turns out it couldn't find the file. I modified the path and pd.read_json worked fine. As for json.loads, this might be helpful.

Ali Kanat
  • 1,888
  • 3
  • 13
  • 23
MoKG
  • 266
  • 3
  • 13
  • 1
    had the same error because I moved the jupyter notebook and jupyter would not adapt the file path. Pandas is returning the worst error message possible here. – Suzana Mar 03 '20 at 16:13
8

You need to indicate to Pandas that "records" formatting (where the JSON appears like a list of dictionaries) is used in datasets.json.

res = pd.read_json('input/dataset.json', orient='records')

print(res.iloc[:, :5])
   %price  Weekly_pct  avg_dist  avg_price city
0    15.4        46.2      3.67          5   ab
1     0.0         NaN      8.26          5   bc
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
5

The following worked for me when pd.read_json failed: open file, load with normal json.load, then load into a pandas dataframe.

    import pandas as pd
    import json

    openfile=open('file.json')
    jsondata=json.load(openfile)
    df=pd.DataFrame(jsondata)

    openfile.close()
    print(df)
embulldogs99
  • 840
  • 9
  • 9
0

For me it was a problem with the path. The path I had to use depended on the directory from where I run the python file. Maybe try to 'cd' into the directory of your python file and then data=pd.read_json('dataset.json') should work.

Beeblebrox
  • 36
  • 4
0

I had to add the parameter lines=True to make it work, e.g:

pd.read_json("dataset.json", lines=True)

Alternatively you could do it like this:

import json
import pandas as pd

with open("dataset.json") as f:
  df = pd.DataFrame([json.loads(l) for l in f.readlines()])
print(df)  # Shows data frame as expected 
Fernando Cardenas
  • 1,203
  • 15
  • 19