Remove last character of previous line under condition

Question

Using bash, from the following piece of F90 code, I try to remove the last "&" if the next line begins with a "AA" (note the whitespace before AA).

     F = 2 * 3 * a * b * 7&
    & * 3 * b * c&
     AA = ...

should become

     F = 2 * 3 * a * b * 7&
    & * 3 * b * c
     AA = ...

There has been a suggestion on Bash - Remove the last character of the line this before? . Based on this, I tried

perl -0pe 's/\&\n\s*AA/\nAA/g' $MYFILE

and also

sed -i 's/\&\n\s*AA/\nAA/g' $MYFILE

which does not create any errors but also does not change anything. I also tried without \s* .

John1024 · Accepted Answer · 2016-07-18T17:14:19.903

Using sed

Using GNU sed:

$ sed -z 's/&\n AA/\n AA/g' file
 F = 2 * 3 * a * b * 7&
& * 3 * b * c
 AA = ...

To keep this command simple, we use the -z option to read in the whole file at once. (Technically, -z reads in NUL-separated input. Since no valid Fortran file contains a NUL, this has the effect of reading in the whole file.)

s/&\n AA/\n AA/g does the substitution that we want. Any place where the file contains & followed by newline followed by space followed by AA, this substitution removes the &.

Reading the whole file in at once is not a good approach if the file is too big to fit in memory. This should not be a problem for Fortran files.

For non-GNU sed (BSD, OSX), we need to add code to replace the -z flag:

sed 'H;1h;$!d;x;  s/&\n AA/\n AA/g' file

Using awk

$ awk '{if (/^ AA/) sub(/[&]$/, "", last); if (NR>1) print last; last=$0} END{print last}' file
 F = 2 * 3 * a * b * 7&
& * 3 * b * c
 AA = ...

How it works:

This script uses one variable last which contains the contents of the previous line. If the current line starts with AA, then we remove, if present, the final & from last. In more detail:

if (/^ AA/) sub(/&$/, "", last)

If the current line starts with AA, then remove the final & from the previous line.
if (NR>1) print last

If we are not on the first line, then print the previous line.
last=$0

Save the current line as last.
END{print last}

After we reach the end of the file, print last.

Changing files in-place

With GNU sed:

sed -zi.bak 's/&\n AA/\n AA/g' file

With other sed:

sed -i.bak 'H;1h;$!d;x;  s/&\n AA/\n AA/g' file

With recent GNU awk:

awk -i inplace '{if (/^ AA/) sub(/&$/, "", last); if (NR>1) print last; last=$0} END{print last}' file

With older awk or non-GNU awk:

awk '{if (/^ AA/) sub(/&$/, "", last); if (NR>1) print last; last=$0} END{print last}' file >file.tmp && mv file.tmp file

That works perfectly fine. Many thanks for the answer and elaborate explanation! Indeed, my F90 file comprises about 10,000 lines, so the first option is less optimal; yet working! Wokred after some trial and error since I forgot the -i flag (as pointed out by ikegami). — Tom, Jul 13 '16 at 19:35
10,000 lines? If your lines are no more than 100 characters long, you're talking about a file no larger than 10MB. That should easily fit in memory. Might as well go with the simpler and faster version that loads the entire file into memory. — ikegami, Jul 13 '16 at 20:36
`&` isn't an RE metacharacter so it doesn't need to be inside a bracket expression `[&]`. `$1 ~ /^AA/` would be more forgiving to leading white space than `/^ AA/`. — Ed Morton, Jul 14 '16 at 02:44

ikegami · Answer 2 · 2016-07-13T19:30:19.700

It becomes quite easy if you load the entire file into memory (as -0777 causes).

perl -0777pe's/&(?=\n[^\S\n]*AA)//g'

Doing it without loading the entire file into memory is done using a sliding window.

perl -ne'$p=~s/&(?=\n)// if /^\s*AA/; print $p; $p=$_; END { print $p }'

or

perl -pe'print $s if !/\s*AA/; $s = s/&\n// ? $& : ""; END { print $s }'

All three accept any number of spaces and tabs before the AA.

Usage:

perl ... file.in >file.out    # From a file
perl ... <file.in >file.out   # From STDIN
perl -i~ ... file             # "In-place", with backup
perl -i ... file              # "In-place", without backup

That works just perfect. I didn't try the first option though due to memory concerns. Many thanks!!! — Tom, Jul 13 '16 at 19:36

score 0 · Answer 3 · answered Jul 13 '16 at 17:54

0

This might work for you (GNU sed):

sed -r 'N;s/&([^&]*\n\s*AA)/\1/;P;D' file

Read two lines into the pattern space (PS) and using pattern matching remove the & from the first line if the second line begins (less whitespace) with AA.

N.B. this caters for the second line also containing an & etc, etc...

answered Jul 13 '16 at 17:54

potong

55,640
6
51
83

Thank you very much. Easy to implement in just one line! Since I am quite new to Bash one also must not forget to save or rewrite the new file using e.g. the -i flag (as pointed out by ikegami). – Tom Jul 13 '16 at 19:39

Remove last character of previous line under condition

3 Answers3

Using sed

Using awk

Changing files in-place