I want to (pre)process large JSON files (5-10GB each), which contain multiple root elements. These root elements follow each other without separator like this: {}{}....
So I first wrote the following simple code to get a valid JSON File:
with open(file) as f:
file_data = f.read()
file_data = file_data.replace("}{", "},{")
file_data = "[" + file_data + "]"
df = pd.read_json(file_data)
Obviously this doesn´t work with large files. Even the 400MB file doesn´t work. (I´ve got 16GB memory)
I´ve read that it´s possible to work with chunks but I don´t manage to get this in ''chunk logic'' Is there a way to ''chunkenize'' this?
I am glad for you help.