1

I have a json file with multiple json objects (each object can be a multiple line json) Example:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Note that indeed this is not valid JSON as whole file (and hence regular "read JSON in Python" code fails, expected), but each individual "fragment" is complete and valid JSON. It sounds like file was produced by some logging tool that simply appends the next block as text to the file.

As expected, regular way of reading that I have tried with the below snippet fails:

with open('run_log.json','r') as file:
    d = json.load(file)
    print(d)

Produces expected error about invalid JSON:

JSONDecodeError: Extra data: line 3 column 1 (char 89)

How can I solve this, possibly using the json module? Ideally, I want to read the json file and get the runs list for only a particular date (Ex : 2022-11-30), but just being able to read all entries would be enough.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
Dolliy
  • 63
  • 3
  • 1
    https://stackoverflow.com/search?q=%5Bpython%5D+extra+data+JSONDecodeError. To give some context, I just used "[python]" to tag this and then picked some key parts from the error message. – Ulrich Eckhardt Nov 30 '22 at 18:20
  • 3
    Does this answer your question? [Loading and parsing a JSON file with multiple JSON objects](https://stackoverflow.com/questions/12451431/loading-and-parsing-a-json-file-with-multiple-json-objects) – fsimonjetz Nov 30 '22 at 18:22
  • It seems like the JSON is malformed. You have two top-level objects, while JSON just allows one. If you know the objects are each a line, then [loading and parsing multiple JSON objects per file](https://stackoverflow.com/questions/12451431/loading-and-parsing-a-json-file-with-multiple-json-objects) would help, but if this is not the case, then you either need to fix the one writing the JSON or try to find the interception points manually. – Cpt.Hook Nov 30 '22 at 18:24
  • I've updated post to be a better sign-post duplicate as such files sometimes called "JSON fragments" and quite common are produced by logging code (same issues exist for XML - logging libraries just append XML fragments to a file without maintaining proper outer structure). – Alexei Levenkov Nov 30 '22 at 18:36
  • The `ndjson` answer below is better, but of course you can do this yourself by reading a line at a time. `for line in file:` / `d = json.loads(line)`. – Tim Roberts Nov 30 '22 at 19:17

2 Answers2

-1

NDJSON, not JSON.

It's a valid file format and often confused for JSON.

Python of course has a library for this.

import ndjson

with open('run_log.json','r') as file:
    d = ndjson.load(file)
    for elem in d:
        print(type(elem), elem)

output

<class 'dict'> {'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
<class 'dict'> {'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}
Edo Akse
  • 4,051
  • 2
  • 10
  • 21
-1

Each line is valid JSON (See JSON Lines format) and it makes a nice format as a logger since a file can append new JSON lines without read/modify/write of the whole file as JSON would require.

You can use json.loads() to parse it a line at a time.

Given run_log.json:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Use:

import json

with open('run_log.json', encoding='utf8') as file:
    for line in file:
        data = json.loads(line)
        print(data)

Output:

{'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
{'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251