scan and merge two lines in huge file (>5 Gb)

Question

I have a huge file (>5 Gb), with a bunch of bugged lines. To fix it I need to:

Find the 'split' lines
Merge the 'split' lines of code together into the intended 'single' line of code
Save the corrected file

Original file: (Notice the 'split' code in lines #113 and #114)

...
#109=CARTESIAN_POINT('',(1.705232012855E0,-7.756877070089E-1,2.48166921056E0));
#110=CARTESIAN_POINT('',(1.705861274751E0,-7.7602308423645E-1,2.480686063358E0));
#111=CARTESIAN_POINT('',(1.705767565089E0,-7.764706427305E-1,2.472310353831E0));
#112=CARTESIAN_POINT('',(1.70570123242E0,-7.767839147852E-1,2.478226532593E0));
#113=CARTESIAN_POINT('',(1.7015612304515E0,-7.96452125292859E-1,
2.416457902634E0));
#114=CARTESIAN_POINT('',(1.701554931826E0,-7.9649012320387E-1,
2.4163429213930E0));
#115=CARTESIAN_POINT('',(1.705923512855E0,-7.756877070089E-1,2.481645657056E0));
#116=CARTESIAN_POINT('',(1.7058612374751E0,-7.7600123423645E-1,2.48068604563358E0));
...

Expected result:

...    
#109=CARTESIAN_POINT('',(1.705232012855E0,-7.756877070089E-1,2.48166921056E0));
#110=CARTESIAN_POINT('',(1.705861274751E0,-7.7602308423645E-1,2.480686063358E0));
#111=CARTESIAN_POINT('',(1.705767565089E0,-7.764706427305E-1,2.472310353831E0));
#112=CARTESIAN_POINT('',(1.70570123242E0,-7.767839147852E-1,2.478226532593E0));
#113=CARTESIAN_POINT('',(1.7015612304515E0,-7.96452125292859E-1,2.416457902634E0));
#114=CARTESIAN_POINT('',(1.701554931826E0,-7.9649012320387E-1,2.4163429213930E0));
#115=CARTESIAN_POINT('',(1.705923512855E0,-7.756877070089E-1,2.481645657056E0));
#116=CARTESIAN_POINT('',(1.7058612374751E0,-7.7600123423645E-1,2.48068604563358E0));
...

I think it is possible by using some combination of cut/paste/sed commands in Unix, Linux, Terminal, but I don't know how to.

Thanks in advance!

Does this answer your question? [sed: joining lines depending on the second one](https://stackoverflow.com/questions/9999934/sed-joining-lines-depending-on-the-second-one) — oguz ismail, Jan 09 '21 at 11:34

score 1 · Answer 1 · answered Jan 09 '21 at 11:31

1

With GNU sed, you can use N to add next line to the pattern space, check if newline character is not followed by # and merge if so:

sed -E 'N;s/\n([^#])/\1/;P;D;' file

answered Jan 09 '21 at 11:31

SLePort

15,211
3
34
44

Thanks! I installed [GNU sed](https://www.gnu.org/software/sed/#download) (hoping that is what you referred to) and run your suggestion in Terminal, it run through the file, but I couldn't see any changes when I open it afterwards, so maybe I am missing a command to save the changes? Otherwise I cannot confirm it is working :/ I'm not really familiar with these kind text transformations. – Julián Jaramillo Jan 09 '21 at 12:58
1

Default output of this command is standard output(your terminal). If you want to edit your file in place, add the `-i` option : `sed -E -i 'N;s/\n([^#])/\1/;P;D;' file` – SLePort Jan 09 '21 at 13:07
Thanks again! I think I'm still getting something wrong somehow. In Terminal, I go to the folder where the file is located and run this exact code: `sed -E -i .bak 'N;s/\n([^#])/\1/;P;D;` filename.stp It seems to run and it creates a backup file, but still identical to the -supposedly- debugged file. Am I still doing something wrong? Thanks in advance! – Julián Jaramillo Jan 09 '21 at 17:52
Your input file may contain Windows line endings (`CR+LF` or `\r\n`). If so, please try instead: `sed -E -i .bak 'N;s/\r?\n([^#])/\1/;P;D;' file` – tshiono Jan 12 '21 at 01:27

scan and merge two lines in huge file (>5 Gb)

1 Answers1