how to append large json files in an efficient manner in python?

Question

I have 121 JSON files that I need to perform some analysis on. So I need to append these files to a single dataframe and then perform the analysis. I can do in batches but the issue is that the data is not sorted in the files. What are the efficient way to combine these files to a single dataframe? I tried the below code (the inefficient one):

# combining the data
for file in files:
    print("Appending: " + file)
    currentDF = dd.read_json(myPath + "\\" + file, lines=True)
    combineDf = combineDf.append(currentDF, ignore_index=True)

score 0 · Answer 1 · answered Oct 05 '20 at 04:30

0

You can first dump the content of json files to a single file using json.dump().

And then use pd.read_json to read from the all merged json file.

answered Oct 05 '20 at 04:30

Wasif

14,755
3
14
34

Sergey Bushmanov · Answer 2 · 2020-10-05T04:44:52.507

0

To combine multiple json you don't need pandas at all. Try:

import json
from pathlib import Path
files = Path('src').rglob('*.json')
d= {}

for file in files:
    with open(file) as f:
        j = json.load(f)
    d.extend(j)

d is your combined output.
You may load it in pandas df if you wish so.

edited Oct 05 '20 at 04:44

answered Oct 05 '20 at 04:32

Sergey Bushmanov

23,310
7
53
72

Hey, thanks for the reply. The files that I am using has an extension of ldjson. When I tried your code I get this error "charmap' codec can't decode byte 0x9d in position 2717135: character maps to ". I got this before when I used json. so I used pandas to load the json files. – Monish K Oct 05 '20 at 05:01
@MonishK Are you on Windows, maybe? If so, one of these workarounds may help: https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character – Nick ODell Oct 05 '20 at 05:17
Thanks @NickODell. But now I am getting "Extra data: line 2 column 1 (char 9136)". I had tried this before. `for file1 in files1: with open(file1, encoding='utf8') as f: j = json.load(f) d.extend(j)` – Monish K Oct 05 '20 at 05:25

how to append large json files in an efficient manner in python?

2 Answers2