-1

I have 121 JSON files that I need to perform some analysis on. So I need to append these files to a single dataframe and then perform the analysis. I can do in batches but the issue is that the data is not sorted in the files. What are the efficient way to combine these files to a single dataframe? I tried the below code (the inefficient one):

# combining the data
for file in files:
    print("Appending: " + file)
    currentDF = dd.read_json(myPath + "\\" + file, lines=True)
    combineDf = combineDf.append(currentDF, ignore_index=True)
user4157124
  • 2,809
  • 13
  • 27
  • 42
Monish K
  • 109
  • 1
  • 5

2 Answers2

0

You can first dump the content of json files to a single file using json.dump().

And then use pd.read_json to read from the all merged json file.

Wasif
  • 14,755
  • 3
  • 14
  • 34
0

To combine multiple json you don't need pandas at all. Try:

import json
from pathlib import Path
files = Path('src').rglob('*.json')
d= {}

for file in files:
    with open(file) as f:
        j = json.load(f)
    d.extend(j)

d is your combined output.
You may load it in pandas df if you wish so.

Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
  • Hey, thanks for the reply. The files that I am using has an extension of ldjson. When I tried your code I get this error "charmap' codec can't decode byte 0x9d in position 2717135: character maps to ". I got this before when I used json. so I used pandas to load the json files. – Monish K Oct 05 '20 at 05:01
  • @MonishK Are you on Windows, maybe? If so, one of these workarounds may help: https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character – Nick ODell Oct 05 '20 at 05:17
  • Thanks @NickODell. But now I am getting "Extra data: line 2 column 1 (char 9136)". I had tried this before. `for file1 in files1: with open(file1, encoding='utf8') as f: j = json.load(f) d.extend(j)` – Monish K Oct 05 '20 at 05:25