0

I have json file with 38 GB and i need to convert it into table and move it to the data base

The problem here is the size of the file and to solve it I thought to split the file into small files.

so for that I use this code:

import os 
import json

#you need to add you path here 
with open(os.path.join('folder/data.json'), 'r',
          encoding='utf-8') as f1:
    ll = [json.loads(line.strip()) for line in f1.readlines()]

    #this is the total length size of the json file
    print(len(ll))

    #in here 2000 means we getting splits of 2000 tweets
    #you can define your own size of split according to your need

    size_of_the_split=1000000
    total = len(ll) // size_of_the_split

    #in here you will get the Number of splits
    print(total+1)

    for i in range(total+1):
        json.dump(ll[i * size_of_the_split:(i + 1) * size_of_the_split], open(
            "result/data_split" + str(i+1) + ".json", 'w',
            encoding='utf8'), ensure_ascii=False, indent=True)

it works fine with any other Json files except this one I get this error MemoryError:

any advice

Fatima
  • 497
  • 5
  • 21
  • It appears the JSON file isn't *one giant* JSON object, but JSON-lines with one JSON object per line?! Then simply process the file line by line, don't dump them all into memory with that list comprehensions. This might mean you need to iterate over the file twice if you need the total length first; but you can do that once simply to count the lines, and then iterate over it again parsing lines one by one. – deceze Dec 06 '22 at 13:02
  • @deceze could you please provide a code example – Fatima Dec 06 '22 at 13:27

0 Answers0