I am starting a new project where i have to load extremely large Json-files into an Oracle-database (or PostgreSQL, though that's not very important).
The files are approximately 100-150 GB, holding 13 different arrays, each at least 500 million lines. Fortunately, the structure isnt very deep and most are just 1 level deep but with a lot of properties.
I am very experienced with databases, but new to Python and Json.
The size of the files means that i cannot just use the most common Python libraries, or at least that i have to use them in a way that're not very intuitive to me as a beginner.
I have a working solution for my 1 mb test-json, but it is not scalable and i have not been able to produce a solution for the real file due to the size.
I would be eternally grateful if someone could guide me in the right direction in regards to useful libraries, solutions or helpful websites.
- Loading, parsing and traversing such huge files.
- A good Pythonic way to load them into the database as i suspect 800 million individual inserts are not the ideal approach.
If more info is needed, i will gladly expand the topic.