0

I have a large json file (6GB) which contains simple key and value pair like

 { "0546585b451000" : "5",
   "0546585b451000111222" : "10"
 }

I am using ijson to parse this file and perform some operation on each object

I want to delete every object from json file itself after completion of iteration.

with open(SOURCE_JSON_FILE, 'r') as fd:
     parser = ijson.parse(fd)
     for prefix, event, value in parser:
         if event == 'number':
            print('prefix={}, event={}, value={}'.format(prefix, event, value))

         ## Delete this row from json file now  

My intention is to minimize the size of actual json file so that if in case process breaks in between, i can continue with remaining keys.

What should be the approach to achieve this? apart from dumping done objects into another file or database.

help is appreciated

Shri
  • 703
  • 3
  • 11
  • 28
  • Apparently the best way to delete first few lines in python needs you to load the whole file into the Memory and writing the rest of the file again. https://stackoverflow.com/questions/20364396/how-to-delete-the-first-line-of-a-text-file-using-python . This way it will be very expensive. You can consider using a simple shell script to achieve this rather than python. – Gautam Dec 05 '18 at 07:41
  • 1
    I suppose, rewriting the file each time you remove a small portion of it isn't a very good idea. It will generate unnecessary high workload on your drive subsystem. If needed, try saving the last processed key somewhere to know the offset in the original file. – Mikhail Zakharov Dec 05 '18 at 07:47
  • 1
    as @Gautam says, don't try to parse the json structure, since you need to read it completely into memory. Instead, do a 'string' search and replace operation on the file this is more brittle, but consumes much lkess memory. A quick and dirty idea, based on your data: `perl -ne 'BEGIN{$/="},"} print "$1\n" if /(\{\s*\"0546585b451000\"\s*:[^,]+\,\s*\"0546585b451000111222\"[^}]+\})/mi' < DATA.json` – murphy Dec 05 '18 at 08:04

0 Answers0