3

I'm trying to load an extremely large JSON file in Python. I've tried:

import json
data = open('file.json').read()
loaded = json.loads(data)

but that gives me a SIGKILL error.

I've tried:

import pandas as pd
df = pd.read_json('file.json')

and I get an out-of-memory error.

I'd like to try to use ijson to stream my data and only pull a subset into it at a time. However, you need to know what the schema of the JSON file is so that you know what events to look for. I don't actually know what the schema of my JSON file is. So, I have two questions:

  1. Is there a way to load or stream a large json file in Python without knowing the schema? Or a way to convert a JSON file into another format (or into a postgresql server, for example)?

  2. Is there a tool for spitting out what the schema of my JSON file is?

UPDATE:

Used head file.json to get an idea of what my JSON file looks like. From there it's a bit easier.

user1566200
  • 1,826
  • 4
  • 27
  • 47

2 Answers2

0

I would deal with smaller pieces of the file. Take a look at Lazy Method for Reading Big File in Python?. You can adapt the proposed answer to parse your JSON object by object.

Community
  • 1
  • 1
V. BRICE
  • 13
  • 4
  • It is a good lead as a general answer, only does not specify how to actually append the separate chunks to the same dataframe object to maintain the same data structure. – MattSom May 14 '20 at 15:05
-1

You can read in chunks, something like this

f=open("file.json")
while True:
    data = f.read(1024)
    if not data:
        break
    yield data

Line by line option data = [] with open('file') as f: for line in f: data.append(json.loads(line))

Also look at https://www.dataquest.io/blog/python-json-tutorial/

Look for more answers with jsonline