0

I have about 1000 files that contain xyz Cartesian coordinates of chemical structures, a sample is provided below

Re                -0.87242200         -0.87371100        0.24194200   
Re                -1.38612300          1.83520600        0.44292100
Re                 1.78955700         -0.15746900        0.71425500

What I'd like to do, preferably through a 'for' loop, is to add an extra line after the second encounter of Re, in that line add a symbol 'H' at the first position of the line then introduce xyz coordinates in the form 1.5+X 1.5+Y 1.5+Z, where X Y and Z are the coordinates of the the second Re. These xyz coordinates should be in position 20, 40 and 60 in the new line (for X, Y and Z respectively).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Welcome to Stack Overflow. Please read the [About] page soon. What you need to do seems nice and straight forward in `awk`. What did you try and how did it fail to do what you want? On SO, we will help you fix your attempt to solve a problem; we won't go out of our way to simply write the code for you. – Jonathan Leffler Nov 23 '15 at 09:05
  • In your example file, the 3rd column starts at 58. – Micha Wiedenmann Nov 23 '15 at 10:33
  • @Jonathan Leffler: Thanks. I'm still learning and not fully experienced with bash yet. I've already succeeded, using sed, to append a new line starting with 'H' after the second occurrence of Re. But I'm stuck at the next step, don't know yet how to extract the xyz values, I guess $2, $3, and $4 of Re, add the constant to the values and insert them as $2, $3 and $4 in the H line. –  Nov 23 '15 at 10:44
  • @ Micha Wiedenmann. Thanks for pointing that out. The 20, 40 and 60 positions are not strict, I just put it like that for simplicity. The exact space is not very important, the columns just have to be within a couple of spaces of each other. –  Nov 23 '15 at 10:46
  • When the task involves floating point arithmetic, neither `sed` nor Bash is the appropriate tool (Bash only supports integer arithmetic; Korn shell supports floating point). Awk is the next step up; you could use Perl or Python instead. – Jonathan Leffler Nov 23 '15 at 11:07

2 Answers2

1

Given the following awk script:

BEGIN      { count = 0 }
/^\<Re\>/  { x=$2; y=$3; z=$4;
             count++;
             print;
           }
count == 2 { printf "%-18s %-19s %-17s %s\n", "H", 1.5+x, 1.5+y, 1.5+z }

you can run it on multiple files with:

for f in file*.txt; do
  gawk -i inplace -f add-H.awk -- "$f"
done

Note that this requires a recent version of GNU awk, which supports inplace modification (see awk save modifications in place).

Community
  • 1
  • 1
Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
  • You're a life saver!. Because of your help I can still catch my allocation time on the University's server to run calculations, otherwise I'd have to wait for another week. Thanks a lot. –  Nov 23 '15 at 11:04
  • It's a long way of writing `awk '/^Re / { print; if (++count == 2) printf("%-18s %-19s %-19% %s\n", "H", $2+1.5, $3+1.5, $4+1.5) }'`. – Jonathan Leffler Nov 23 '15 at 11:12
  • @Jonathan Leffler. Nicer and simpler, Thanks. –  Nov 23 '15 at 11:17
  • @JonathanLeffler Please either update my answer or post a new answer, so I can delete mine. – Micha Wiedenmann Nov 23 '15 at 12:51
1

This is a task for Awk (or Perl or Python). It isn't suitable for Sed because it can't do arithmetic; it isn't really suitable for Bash because it only does integer arithmetic. It could be done in Korn shell because it supports floating-point arithmetic, but Awk is probably the best tool for the task.

In the sample data, all the lines begin Re. For such data, this is sufficient:

awk '/^Re / { print
              if (++count == 2)
                  printf("%-18s %-19s %-19% %s\n", "H", $2+1.5, $3+1.5, $4+1.5) }'

If there are other symbols at the start of a line that need to be printed, then you need:

awk '/^Re / { print
              if (++count == 2)
                  printf("%-18s %-19s %-19% %s\n", "H", $2+1.5, $3+1.5, $4+1.5)
              next }
            { print }'

The next skips the trailing { print } which processes any other lines. That { print } could be abbreviated to 1 or any other non-zero (true) value which triggers the default action, namely print. With the addition of a couple of semicolons, either script could be squished onto a single line, but I think the clarity of multiple lines is better.

awk '/^Re / { print; if (++count == 2) printf("%-18s %-19s %-19% %s\n", "H", $2+1.5, $3+1.5, $4+1.5); next } { print }'

If you need to control the number of decimal places printed, you can use %-19.8f or %+-19.8f instead of the %-19s and %s conversion specifications.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278