I want to load a JSONL file as JSON objects in python. Is there an easy way to do so?
Asked
Active
Viewed 1e+01k times
74
-
1How? Do you want to make each line an entry in an array? A JSON file *can't store* multiple items without them being in a top-level object -- that's the whole point of having the JSONL format at all. – Charles Duffy May 22 '18 at 20:03
-
2...and is there really a reason you want to do this in Python, vs. as a one-liner in `jq`? – Charles Duffy May 22 '18 at 20:03
-
1Thx for the comment. I haven't really been using these formats before and obviously I didn't really got the point of JSONL - your comment helped me to do so! Actually the given answer is still useful to me - so I edited the question that it makes sense.. Thx again for the comment! – MBT May 22 '18 at 20:49
5 Answers
75
Full steps including file operations for beginners like me
Assuming you have a .jsonl
file like:
{"reviewerID": "A2IBPI20UZIR0U", "asin": "1384719342", "reviewerName": "cassandra tu \"Yeah, well, that's just like, u...", "helpful": [0, 0], "reviewText": "Not much to write about here, but it does exactly what it's supposed to. filters out the pop sounds. now my recordings are much more crisp. it is one of the lowest prices pop filters on amazon so might as well buy it, they honestly work the same despite their pricing,", "overall": 5.0, "summary": "good", "unixReviewTime": 1393545600, "reviewTime": "02 28, 2014"}
{"reviewerID": "A14VAT5EAX3D9S", "asin": "1384719342", "reviewerName": "Jake", "helpful": [13, 14], "reviewText": "The product does exactly as it should and is quite affordable.I did not realized it was double screened until it arrived, so it was even better than I had expected.As an added bonus, one of the screens carries a small hint of the smell of an old grape candy I used to buy, so for reminiscent's sake, I cannot stop putting the pop filter next to my nose and smelling it after recording. :DIf you needed a pop filter, this will work just as well as the expensive ones, and it may even come with a pleasing aroma like mine did!Buy this product! :]", "overall": 5.0, "summary": "Jake", "unixReviewTime": 1363392000, "reviewTime": "03 16, 2013"}
This code should work:
import json
with open('./data/my_filename.jsonl', 'r') as json_file:
json_list = list(json_file)
for json_str in json_list:
result = json.loads(json_str)
print(f"result: {result}")
print(isinstance(result, dict))
About .jsonl
files:
http://jsonlines.org/

cryanbhu
- 4,780
- 6
- 29
- 47
75
Setting the parameter lines to True should do the trick.
import pandas as pd
jsonObj = pd.read_json(path_or_buf=file_path, lines=True)

Bahri Mohamed Aziz
- 800
- 1
- 5
- 8
-
2This one-liner is more helpful than the traditional method, Thanks. – Yash Kumar Atri Feb 15 '22 at 13:10
-
3
30
The splitlines would address that problem for you, so In general the code below will work for you:
import json
result = [json.loads(jline) for jline in jsonl_content.splitlines()]
If that's the response object the result would be:
result = [json.loads(jline) for jline in response.read().splitlines()]

Andriy Ivaneyko
- 20,639
- 6
- 60
- 82
-
3
-
@CMCDragonkai good point, updated answer, splitlines would cover that case. – Andriy Ivaneyko Sep 02 '20 at 09:35
-
How does `splitlines` cover that? If the JSON object internally has a newline, then it would be split at that point. – CMCDragonkai Sep 03 '20 at 05:37
-
@CMCDragonkai read Python `splitlines` doc: The splitlines() method splits a string into a list. The splitting is done at line breaks. So it divides it into list and then his code loads it one by one – cryanbhu Oct 19 '20 at 05:02
-
-
2@CMCDragonkai the new line within string is serialiazed from `\n` -> `\\n` which isn't considered to be a line break. So line breaks of object which is represented by json string is retained. – Andriy Ivaneyko Oct 20 '20 at 12:12
27
Quick and easy native solution without using any split()
functions:
import json
with open('/path/to/file.jsonl') as f:
data = [json.loads(line) for line in f]

MBT
- 21,733
- 19
- 84
- 102
3
You can add more keys, but this should work. Say, each line is of the following format. Basically, j_line is a dictionary and access each element like how you would access a dictionary. I have shared accessing nested objects as well.
{"key1":"value", "key2":{"prop_1": "value"}}
with open("foo.jsonl") as f1:
for line in f1:
j_line=json.loads(line)
key_1=j_line['key1']
prop_1=j_line['key2']['prop_2]

san
- 31
- 3