sed not working on large file [Looking for other options]

Question

I have a gigantic json file that was accidentally output without a newline character in between all the json entries. It is being treated as one giant single line. So what I did was try and take a find an replace with sed and insert a newline.

sed 's/{"seq_id"/\n{"seq_id"/g' my_giant_json.json

It doesn't output anything

However, I know my sed expression is working if I operate on just a small part of the file and it works fine.

head -c 1000000 my_giant_json.json |  sed 's/{"seq_id"/\n{"seq_id"/g'

I have also tried using python with this gnarly one liner

'\n{"seq_id'.join(open(json_file,'r').readlines()[0].split('{"seq_id')).lstrip()

But this loads into memory thanks to readlines() method. But I don't know how to iterate through a giant single line of characters (iterate in chunks) and do a find and replace.

Any thoughts?

Read N characters at a time. Check this answer for ideas https://stackoverflow.com/questions/2988211/how-to-read-a-single-character-at-a-time-from-a-file-in-python — CrowbarKZ, Jan 22 '18 at 19:28
@snakecharmb - not duplicate. That would work if it was a constant number of lines between objects. But unfortunately its not. — jwillis0720, Jan 22 '18 at 19:36
@CrobarKZ That might work, if I right the files as chunks. Will get back. — jwillis0720, Jan 22 '18 at 19:36

Bo Borgerson · Accepted Answer · 2018-01-22T20:02:49.347

Perl will let you change the input separator ($/) from newline to another character. You could take advantage of this to get some convenient chunking.

perl -pe'BEGIN{$/="}"}s/^({"seq_id")/\n$1/' my_giant_json.json

That sets the input separator to be "}". Then it looks for chunks that start with {"seq_id" and prefixes them with a newline.

Note that it puts an unnecessary empty line at the beginning. You could complicate the program to eliminate that or just delete it manually after.

sed not working on large file [Looking for other options]

1 Answers1