0

I'm having a large json file which I'm struggling to read and work with in python. It seems I can for instance run json.loads() but then it crashes after a while.

There are two questions which are basically the same thing:

Reading rather large JSON files

Is there a memory efficient and fast way to load big JSON files?

But these questions are from 2010 and 2012, so I was wondering if there's a newer/better/faster way to do things?

My file is on the format:

import json

f = open('../Data/response.json')
data = json.load(f)
dict_keys(['item', 'version'])

# Path to data : data['item']

Thanks.

OLGJ
  • 331
  • 1
  • 7
  • Does this answer your question? [Is there a memory efficient and fast way to load big JSON files?](https://stackoverflow.com/questions/2400643/is-there-a-memory-efficient-and-fast-way-to-load-big-json-files) – kaliiiiiiiii Jan 31 '23 at 12:48
  • 1
    How big is your .json file and how many RAM are you able to afford? – Daweo Jan 31 '23 at 12:48
  • Can you show the code that "crashes"? How large is "large"? – DarkKnight Jan 31 '23 at 13:17
  • @Daweo my RAM is 15.8 GB, file is 285 MB. – OLGJ Jan 31 '23 at 15:27
  • @Pingu the code that "crash" is just loading the data from disk. Then shortly afterward I get "The window is not responding" in VS Code. – OLGJ Jan 31 '23 at 15:29
  • @OLGJ then it should fit, which lead me to wonder if it is not somewhat broken rather than just big. I think you should try validating that file using python-independent tool (JSON checker) but I do not have recommendation for exact tool. – Daweo Jan 31 '23 at 15:39
  • @kaliiiiiiiii no I tried the different suggestions, but I succeed with none. – OLGJ Jan 31 '23 at 15:41
  • @Daweo aha so you think there might be some formatting issues with the file? – OLGJ Jan 31 '23 at 15:42
  • @OLGJ How does the "crash" manifest itself? Do you see an exception? Also, try running your script outside of VSCode (not that it should matter but it's worth checking) – DarkKnight Jan 31 '23 at 16:10
  • @Pingu the window just stops responding. I am trying to convert the file to a CSV file now to see if that might help. – OLGJ Jan 31 '23 at 16:24
  • @OLGJ FWIW, I have created a JSON that's a little more than 257MB on disk. The structure will obviously not be the same as yours as you've chosen not to share any fragment of it so the test may not be 100% valid. The time taken to open it and load (*json.load()*) is less than 5 seconds. The entire Python process consumes 1.14GB RAM. I conclude that your JSON must be corrupt – DarkKnight Jan 31 '23 at 16:24
  • @OLGJ How do you propose to convert it to CSV of you can't load it? – DarkKnight Jan 31 '23 at 16:25
  • This is exactly the symptom I get when I try to get Python (IDLE) to print something extremely large. Ctrl-C doesn't interrupt it, and leaving it alone for 24 hours isn't enough to get it to respond. Are you trying to print this JSON object or any part of it? – Mark Ransom Jun 26 '23 at 02:01

1 Answers1

0

The problem with Pythons's json function is that it loads the entire file into memory. It means that it's super fast, but you'd need more memory (RAM) if the file is big. That can be a problem if you have a big JSON file. The solution would be to read each object at a time.

One particular file type worth exploring (or at least the ideas around it) is JSONL. Each line represents a separate JSON object. With this format, you can read a line into memory, do something with it, and then move on to the next. Each time, you'd be storing that single object into memory as supposed to the entire file contents.

Another solution would be (and I know, this is tooting my own horn a little), it to check out https://pypi.org/project/json-lineage/ as a possible solution.

In a nutshell, it's geared towards supporting larger files where you want to load one object at a time into memory and work with one file at a time.

Here is a usage guide that could do the trick for you:

from json_lineage import load

jsonl_iter = load("path/to/file.json")


for obj in jsonl_iter:
    do_something(obj)
Salaah Amin
  • 382
  • 3
  • 14