Loading a file with more than one line of JSON into Pandas

Question

I am trying to read in a JSON file into Python pandas (0.14.0) data frame. Here is the first line line of the JSON file:

{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "P_Mk0ygOilLJo4_WEvabAA", "review_id": "OeT5kgUOe3vcN7H6ImVmZQ", "stars": 3, "date": "2005-08-26", "text": "This is a pretty typical cafe.  The sandwiches and wraps are good but a little overpriced and the food items are the same.  The chicken caesar salad wrap is my favorite here but everything else is pretty much par for the course.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}

I am trying do the following:df = pd.read_json(path).

I am getting the following error (with full traceback):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
    date_unit).parse()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
    self._parse_no_numpy()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Trailing data

What is the Trailing data error? How do I read it into a data frame?

Following some suggestions, here are few lines of the .json file:

{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "P_Mk0ygOilLJo4_WEvabAA", "review_id": "OeT5kgUOe3vcN7H6ImVmZQ", "stars": 3, "date": "2005-08-26", "text": "This is a pretty typical cafe.  The sandwiches and wraps are good but a little overpriced and the food items are the same.  The chicken caesar salad wrap is my favorite here but everything else is pretty much par for the course.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "TNJRTBrl0yjtpAACr1Bthg", "review_id": "qq3zF2dDUh3EjMDuKBqhEA", "stars": 3, "date": "2005-11-23", "text": "I agree with other reviewers - this is a pretty typical financial district cafe.  However, they have fantastic pies.  I ordered three pies for an office event (apple, pumpkin cheesecake, and pecan) - all were delicious, particularly the cheesecake.  The sucker weighed in about 4 pounds - no joke.\n\nNo surprises on the cafe side - great pies and cakes from the catering business.", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}
{"votes": {"funny": 0, "useful": 0, "cool": 0}, "user_id": "H_mngeK3DmjlOu595zZMsA", "review_id": "i3eQTINJXe3WUmyIpvhE9w", "stars": 3, "date": "2005-11-23", "text": "Decent enough food, but very overpriced. Just a large soup is almost $5. Their specials are $6.50, and with an overpriced soda or juice, it's approaching $10. A bit much for a cafe lunch!", "type": "review", "business_id": "Jp9svt7sRT4zwdbzQ8KQmw"}

This .json file I am using contains one JSON object in each line as per the specification.

I tried the jsonlint.com website as suggested and it gives the following error:

Parse error on line 14:
...t7sRT4zwdbzQ8KQmw"}{    "votes": {
----------------------^
Expecting 'EOF', '}', ',', ']'

You have additional data in the file that isn't part of the JSON object. — Martijn Pieters, May 06 '15 at 21:37
While the json you show is valid, what you should do first is run http://jsonlint.com/ (or similar tool) before you waste time on invalid data. — Gary Walker, May 06 '15 at 21:44
This example reads in fine for me in pandas 0.16.0. What version of pandas are you using? — Andy Hayden, May 06 '15 at 22:00
@user62198 update to 0.16.0, there's been some fixes to read_json. — Andy Hayden, May 06 '15 at 22:10
you load the whole file or each line individually? From the edited post it's clear that you shouuld parse each line individually or alter your json file to be like this: [ {...}, {..}, {...} ] — Cornel Ghiban, May 07 '15 at 13:40
@Cornel Ghiban, I can load the whole file or read in an individual line. It seems converting into the format you mentioned might be a bit difficult as there are over 5 million such records. — user62198, May 07 '15 at 13:43

score 357 · Accepted Answer · answered Dec 19 '16 at 16:04

357

From version 0.19.0 of Pandas you can use the lines parameter, like so:

import pandas as pd

data = pd.read_json('/path/to/file.json', lines=True)

answered Dec 19 '16 at 16:04

Andrew

7,286
3
28
38

1

Any idea how to get a workaround of this issue relevant to the `lines` argument? https://github.com/pandas-dev/pandas/issues/15132 – Chuck Mar 13 '17 at 11:50

score 34 · Answer 2 · answered Jul 16 '15 at 15:23

You have to read it line by line. For example, you can use the following code provided by ryptophan on reddit:

import pandas as pd

# read the entire file into a python array
with open('your.json', 'rb') as f:
    data = f.readlines()

# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)

# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ','.join(data) + "]"

# now, load it into pandas
data_df = pd.read_json(data_json_str)

Hi, I am trying to read un json file and store into dataframe. However, when I used your code, I got an error : "TypeError: sequence item 0: expected str instance, bytes found". Do you know what wrong with it ? — ngoduyvu, Aug 20 '17 at 09:51
change 'rb' in line 4 to 'r' and you should not get the bytes error. — Steven Barnard, Jan 19 '21 at 19:49

score 8 · Answer 3 · edited Jan 11 '20 at 12:37

8

The following code helped me to load JSON content into a dataframe:

import json
import pandas as pd

with open('Appointment.json', encoding="utf8") as f:
    data = f.readlines()
    data = [json.loads(line) for line in data] #convert string to dict format
df = pd.read_json(data) # Load into dataframe

edited Jan 11 '20 at 12:37

UserAG

142
7

answered May 09 '19 at 13:50

Triguna

101
2
6

score 4 · Answer 4 · answered Jan 07 '21 at 11:24

I've also faced same problem. It happens when your data is written in lines separated by endlines like '\n'; You need to first read them in lines, then convert each line to python built-in types. I solved it in this way:

with open("/path/to/file") as f:
    content = f.readlines()

data = [eval(c) for c in content]
data = pd.DataFrame(data)

Good luck!

score 2 · Answer 5 · edited Jul 12 '19 at 16:51

2

I had a similar problem.

It turns out that pd.read_json(myfile.json) will search in the parent folder automatically, but it returns this 'trailing data' error if you're not in the same folder as the file.

I figured it out, because when I tried to do it with open('myfile.json', 'r'), and I got a FileNotFound error, so I checked the paths.

I had failed to move myfile.json into the same folder as my notebook.

Changing it to pd.read_json('../myfile.json') just worked.

edited Jul 12 '19 at 16:51

Peter Mortensen

30,738
21
105
131

answered Jun 18 '19 at 16:39

szeitlin

3,197
2
23
19

1

It's silly that it gives a `ValueError: Trailing data` when it should give a `FileNotFound`. This happened to me as well. – ProGirlXOXO Aug 25 '20 at 20:11

Loading a file with more than one line of JSON into Pandas

5 Answers5

Linked

Related