2

Using bash, from the following piece of F90 code, I try to remove the last "&" if the next line begins with a "AA" (note the whitespace before AA).

     F = 2 * 3 * a * b * 7&
    & * 3 * b * c&
     AA = ...

should become

     F = 2 * 3 * a * b * 7&
    & * 3 * b * c
     AA = ...

There has been a suggestion on Bash - Remove the last character of the line this before? . Based on this, I tried

perl -0pe 's/\&\n\s*AA/\nAA/g' $MYFILE

and also

sed -i 's/\&\n\s*AA/\nAA/g' $MYFILE

which does not create any errors but also does not change anything. I also tried without \s* .

Community
  • 1
  • 1
Tom
  • 23
  • 3

3 Answers3

3

Using sed

Using GNU sed:

$ sed -z 's/&\n AA/\n AA/g' file
 F = 2 * 3 * a * b * 7&
& * 3 * b * c
 AA = ...

To keep this command simple, we use the -z option to read in the whole file at once. (Technically, -z reads in NUL-separated input. Since no valid Fortran file contains a NUL, this has the effect of reading in the whole file.)

s/&\n AA/\n AA/g does the substitution that we want. Any place where the file contains & followed by newline followed by space followed by AA, this substitution removes the &.

Reading the whole file in at once is not a good approach if the file is too big to fit in memory. This should not be a problem for Fortran files.

For non-GNU sed (BSD, OSX), we need to add code to replace the -z flag:

sed 'H;1h;$!d;x;  s/&\n AA/\n AA/g' file

Using awk

$ awk '{if (/^ AA/) sub(/[&]$/, "", last); if (NR>1) print last; last=$0} END{print last}' file
 F = 2 * 3 * a * b * 7&
& * 3 * b * c
 AA = ...

How it works:

This script uses one variable last which contains the contents of the previous line. If the current line starts with AA, then we remove, if present, the final & from last. In more detail:

  • if (/^ AA/) sub(/&$/, "", last)

    If the current line starts with AA, then remove the final & from the previous line.

  • if (NR>1) print last

    If we are not on the first line, then print the previous line.

  • last=$0

    Save the current line as last.

  • END{print last}

    After we reach the end of the file, print last.

Changing files in-place

With GNU sed:

sed -zi.bak 's/&\n AA/\n AA/g' file

With other sed:

sed -i.bak 'H;1h;$!d;x;  s/&\n AA/\n AA/g' file

With recent GNU awk:

awk -i inplace '{if (/^ AA/) sub(/&$/, "", last); if (NR>1) print last; last=$0} END{print last}' file

With older awk or non-GNU awk:

awk '{if (/^ AA/) sub(/&$/, "", last); if (NR>1) print last; last=$0} END{print last}' file >file.tmp && mv file.tmp file
John1024
  • 109,961
  • 14
  • 137
  • 171
  • 1
    That works perfectly fine. Many thanks for the answer and elaborate explanation! Indeed, my F90 file comprises about 10,000 lines, so the first option is less optimal; yet working! Wokred after some trial and error since I forgot the -i flag (as pointed out by ikegami). – Tom Jul 13 '16 at 19:35
  • 1
    10,000 lines? If your lines are no more than 100 characters long, you're talking about a file no larger than 10MB. That should easily fit in memory. Might as well go with the simpler and faster version that loads the entire file into memory. – ikegami Jul 13 '16 at 20:36
  • `&` isn't an RE metacharacter so it doesn't need to be inside a bracket expression `[&]`. `$1 ~ /^AA/` would be more forgiving to leading white space than `/^ AA/`. – Ed Morton Jul 14 '16 at 02:44
3

It becomes quite easy if you load the entire file into memory (as -0777 causes).

perl -0777pe's/&(?=\n[^\S\n]*AA)//g'

Doing it without loading the entire file into memory is done using a sliding window.

perl -ne'$p=~s/&(?=\n)// if /^\s*AA/; print $p; $p=$_; END { print $p }'

or

perl -pe'print $s if !/\s*AA/; $s = s/&\n// ? $& : ""; END { print $s }'

All three accept any number of spaces and tabs before the AA.

Usage:

perl ... file.in >file.out    # From a file
perl ... <file.in >file.out   # From STDIN
perl -i~ ... file             # "In-place", with backup
perl -i ... file              # "In-place", without backup
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • That works just perfect. I didn't try the first option though due to memory concerns. Many thanks!!! – Tom Jul 13 '16 at 19:36
0

This might work for you (GNU sed):

sed -r 'N;s/&([^&]*\n\s*AA)/\1/;P;D' file

Read two lines into the pattern space (PS) and using pattern matching remove the & from the first line if the second line begins (less whitespace) with AA.

N.B. this caters for the second line also containing an & etc, etc...

potong
  • 55,640
  • 6
  • 51
  • 83
  • Thank you very much. Easy to implement in just one line! Since I am quite new to Bash one also must not forget to save or rewrite the new file using e.g. the -i flag (as pointed out by ikegami). – Tom Jul 13 '16 at 19:39