0

I have a JSON file (api.json) with list of dictionaries from API like that:

[
{
    "column1": "value1",
    "column2": "value2",
    "column3": "value3"
},
{
    "column1": "value4",
    "column2": "value5",
    "column3": "{'something':'something'}"
},
{
    "column1": "value7",
    "column2": "value8",
    "column3": "value9"
},
]

Every dictionary in the list represents one row in database. The list is large in size and I don't want to load it to memory. How do I split the file into multiple smaller files(without going to bash) - each containing a list of no more than 1000 dictionaries? According to https://stackoverflow.com/a/6475340/8156638 I can read the file line by line but how do split it?

PS When I try to use json.load() I get MemoryError

PaszaVonPomiot
  • 189
  • 1
  • 8
  • You need a streaming json parser - [here](https://pypi.org/project/ijson/) is the first one I found (I haven't tried it personally though). – Shadow Oct 04 '18 at 23:05
  • Yes, streaming parser sounds right for the job. However I can't seem to make it split the file. I've also found another parser "yajl-py" but I need to read the docs first. – PaszaVonPomiot Oct 06 '18 at 00:28

1 Answers1

1

Split by keeping the structure.

You have an Array, denoted by the outer square brackets: []

Then, you have objects, denoted but the curly brackets: {}

Split into different files, by creating different arrays:

File A:

[
  {
    "column1": "value1",
    "column2": "value2",
    "column3": "value3"
  },
  {
    "column1": "value4",
    "column2": "value5",
    "column3": "{'something':'something'}"
  }
]

File B:

[
  {
    "column1": "value7",
    "column2": "value8",
    "column3": "value9"
  }
]

Then, you can read each file, and they will be correct.

Octavio Galindo
  • 330
  • 2
  • 9