0

I am starting a new project where i have to load extremely large Json-files into an Oracle-database (or PostgreSQL, though that's not very important).

The files are approximately 100-150 GB, holding 13 different arrays, each at least 500 million lines. Fortunately, the structure isnt very deep and most are just 1 level deep but with a lot of properties.

I am very experienced with databases, but new to Python and Json.

The size of the files means that i cannot just use the most common Python libraries, or at least that i have to use them in a way that're not very intuitive to me as a beginner.

I have a working solution for my 1 mb test-json, but it is not scalable and i have not been able to produce a solution for the real file due to the size.

I would be eternally grateful if someone could guide me in the right direction in regards to useful libraries, solutions or helpful websites.

  1. Loading, parsing and traversing such huge files.
  2. A good Pythonic way to load them into the database as i suspect 800 million individual inserts are not the ideal approach.

If more info is needed, i will gladly expand the topic.

Melan P
  • 9
  • 1
  • Does this answer your question? [Reading rather large JSON files](https://stackoverflow.com/questions/10382253/reading-rather-large-json-files) – Ahmed AEK Oct 26 '22 at 19:48
  • 1
    Can you provide some sample input & output? – Buddhi Oct 26 '22 at 19:56
  • Just an idea: Open a stream instead of opening the entire file at once using [this solution](https://stackoverflow.com/questions/10382253/reading-rather-large-json-files/10382359#10382359), then use [peewee](http://docs.peewee-orm.com/en/latest/), it can place dictionaries into tables as rows. use it's `insert_many()` function to further optimize the chunks of data your are entering. – Zack Walton Oct 26 '22 at 20:00
  • @AhmedAEK Thanks. Strange that i didnt stumble over that topic when searching. I will give it a try. Might solve the Json part, but not the database part. – Melan P Oct 26 '22 at 20:14
  • What's the DB datatype? What DB version? See [Streaming LOBs (Write)](https://python-oracledb.readthedocs.io/en/latest/user_guide/lob_data.html#streaming-lobs-write). – Christopher Jones Oct 26 '22 at 22:29
  • It's unclear. Do you want to parse a JSON list, convert each element into a DB column values, and insert them into a table? What is your main concern? Parsing a JSON file or inserting rows into DB? You want to load a huge file into a memory at once? – relent95 Oct 27 '22 at 02:34

0 Answers0