0

I have some big json files with the following structure:

[
  {
    "url": "",
    "publishedDate": "",
    "modifiedDate": "",
    "title": "",
    "summary": "",
    "content": "",
    "language": "",
    "section": "",
    "tags": [],
    "authors": []
  },
  {
    "url": "",
    "publishedDate": "",
    "modifiedDate": "",
    "title": "",
    "summary": "",
    "content": "",
    "language": "",
    "section": "",
    "tags": [],
    "authors": []
  },
  ...
]

But serializing this big JSONs with the default python json library ends up consuming too much memory so I've searched for other alternatives. One of such is ijson which, is supposed to consume only the same amount in memory as the file size itself.

Problem is, I don't know how to use it (I'm new to python from a java perspective) and most tutorials I've found don't parse jsons like the one above. How can I make ijson yield dictionaries for each item in the json's list?

Thanks in advance.

loko
  • 111
  • 7
  • Have you checked [the `ijson` documentation](https://pypi.org/project/ijson/#high-level-interfaces)? There's an example that's quite similar to what you are trying to do here. – Thomas Feb 04 '21 at 15:01
  • Yes, I've seen the `kvitems` method but wasn't sure how to use it for this json. Thanks still. – loko Feb 04 '21 at 15:05

0 Answers0