0

I have a type of data file that contains only once (!) the following block of text:

Begin final coordinates
     new unit-cell volume =    460.57251 a.u.^3 (    68.24980 Ang^3 )
     density =      7.37364 g/cm^3

CELL_PARAMETERS (alat=  7.29434300)
   0.995319813   0.000000000   0.000000000
   0.000000000   0.995319813   0.000000000
   0.000000000   0.000000000   1.197882354

ATOMIC_POSITIONS (crystal)
Pb            0.0000000000        0.0000000000       -0.0166356359
O             0.5000000000        0.5000000000        0.1549702780
Ti            0.5000000000        0.5000000000        0.5327649171
O             0.0000000000        0.5000000000        0.6381882204
O             0.5000000000        0.0000000000        0.6381882204
End final coordinates

I have found how to extract the entire block of lines between the Begin final coordinates and End final coordinates patterns but I need to it to be more refined. I would like to extract first the three lines below the line starting with CELL_PARAMETERS. Then I would like to extract (with another action not in the same awk command), the 5 lines below ATOMIC_POSITIONS.

I have to make an observation here: I said at the beginning the the block of text appears only once and this is true for that specific form with Begin final coordinates and End final coordinates. Throughout the data file there are many blocks with this form:

CELL_PARAMETERS (alat=  7.29434300)
   0.995319813   0.000000000   0.000000000
   0.000000000   0.995319813   0.000000000
   0.000000000   0.000000000   1.197882354

ATOMIC_POSITIONS (crystal)
Pb            0.0000000000        0.0000000000       -0.0166356359
O             0.5000000000        0.5000000000        0.1549702780
Ti            0.5000000000        0.5000000000        0.5327649171
O             0.0000000000        0.5000000000        0.6381882204
O             0.5000000000        0.0000000000        0.6381882204

So unfortunately I cannot just use the CELL_PARAMETERS and ATOMIC_POSITIONS lines as patterns. The only ones appearing only once are the Begin final coordinates and End final coordinates so I have to extract text relative to these lines.

I have tried to marry the method to extract lines between two patterns from here with the one for skipping N lines after finding pattern from here. Unfortunately I can't make it work.

So my idea was:

  1. for the first case: I was trying to find the Begin final coordinates pattern and skip 5 lines including the one with the pattern) then print the 3 lines I am interested in and then skip the rest until the End final coordinates.

  2. for the second case: find Begin final coordinates then skip the lines until ATOMIC_POSITIONS (skipping this one too), print the next 5 lines until the End final coordinates.

Can this be done?

Update:

I have just tried this:

awk '/Begin final coordinates/ {n=NR+9} n < NR < n+3'

but i get syntax error:

awk: cmd. line:1: /Begin final coordinates/ {n=NR+9} n<NR<n+3
awk: cmd. line:1:                                        ^ syntax error

What am i doing wrong here?

Update2:

Hold the presses, I got it!

  1. this solves the first case: awk '/Begin final coordinates/{n=NR+4;m=NR+8} (n<NR) && (NR<m)' file
  2. this solves the second case: awk '/Begin final coordinates/{n=NR+9;m=NR+8} (n<NR) && (NR<m)' file

Is not very nice but it will do the job!

lucian
  • 350
  • 4
  • 18

3 Answers3

2

Hold the presses, I got it!

  1. this solves the first case:

    awk '/Begin final coordinates/{n=NR+4;m=NR+8} (n<NR) && (NR<m)' file
    
  2. this solves the second case:

    awk '/Begin final coordinates/{n=NR+9;m=NR+8} (n<NR) && (NR<m)' file
    
Fravadona
  • 13,917
  • 1
  • 23
  • 35
lucian
  • 350
  • 4
  • 18
1

With this you only need to read the input once:

awk '/Begin final coordinates/{n1=NR+4;m1=NR+8; n2=NR+9;m2=NR+8} 
     (n1<NR) && (NR<m1){ print > "CELL_PARAMETERS.txt"; }
     (n2<NR) && (NR<m2){ print > "ATOMIC_POSITIONS.txt"; }
     ' file
Luuk
  • 12,245
  • 5
  • 22
  • 33
0

Assuming that Begin final block only occurs after all of the other blocks:

$ awk '/^Begin final/{f=1} c&&c--; f && /^CELL/{c=3}' file
   0.995319813   0.000000000   0.000000000
   0.000000000   0.995319813   0.000000000
   0.000000000   0.000000000   1.197882354

$ awk '/^Begin final/{f=1} c&&c--; f && /^ATOMIC/{c=5}' file
Pb            0.0000000000        0.0000000000       -0.0166356359
O             0.5000000000        0.5000000000        0.1549702780
Ti            0.5000000000        0.5000000000        0.5327649171
O             0.0000000000        0.5000000000        0.6381882204
O             0.5000000000        0.0000000000        0.6381882204

or if it could appear anywhere then change c&&c--; to c{print; if (!c--) exit}.

See https://stackoverflow.com/a/17914105/1745001 for related idioms.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185