Alternative Way to Load Large Json File

Question

I am trying to load a large json file (around 4G) as a pandas dataframe, but the following method does not work for file > around 2G. Is there any alternative method?

data_dir = 'data.json' my_data = pd.read_json(data_dir, lines = True)

I tried ijson but have no idea how to covert it to a dataframe.

@JonasAdler I'm going to go ahead with the assumption that he's using 32-bit python the [~2GB limit](https://stackoverflow.com/a/639562/4022608) would too much of a coincidence otherwise. — Baldrickk, Jul 11 '17 at 08:51
To the comments above, I am using a 64-bit with 8GB and I still had 55% left so ideally it should work :). Anyway, thanks to your advice with `json.loads` it's working now. — Howell Yu, Jul 11 '17 at 09:01
it is not because the file on disk is 4GB that the representation in memory is 4GB. Python creates an object for every string which might take more place than on disk. — Maarten Fabré, Jul 11 '17 at 09:42

score 1 · Answer 1 · answered Feb 19 '20 at 00:09

Loading the Large document in memory may not be best approach in this case. This size of JSON may require you to use a different approach for parsing. Try using Streaming parsers instead. Some options

https://pypi.org/project/json-stream-parser/

https://pypi.org/project/ijson/

The key is to not load the entire document in memory. This is similar to SAX parsing in the XML world.

I am not a python expert, however, there should be a good library that can already do it for you.

Alternative Way to Load Large Json File

1 Answers1