1

After matching the last "Constrained" I want to print the 6th to 10th line:

This is what I've tried:

awk '/Constrained/ { print ; for(n=6; n<10; n++) { getline ; print } }' filename

But doesn't work. I was thinking of using tail -5 to get only the last 5 lines. (LInes 6 to 10 of only the last match)

You can test it with this:

************************** Constrained Symmetrised Forces **************************
 *                                                                                  *
 *                           Cartesian components (eV/A)                            *
 * -------------------------------------------------------------------------------- *
 *                         x                    y                    z              *
 *                                                                                  *
 * O               1     -0.03440             -0.03440              0.00000         *
 * O               2      0.03440              0.03440             -0.00000         *
 * O               3     -0.03440              0.03440             -0.00000         *
 * O               4      0.03440             -0.03440              0.00000         *
 * Ti              1      0.00000              0.00000              0.00000         *

I should get the lines that start with O and end with Ti. But throughout the file there are several "constrained"

Caterina
  • 775
  • 9
  • 26

6 Answers6

3

All you need is:

grep -A10 Constrained file | tail -n 5
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 3
    Ed you are not fun at all ;-) Brilliant to see how most of us are grepping tacing awking while it is as easy as it can be ;-) – kvantour Jul 15 '19 at 16:52
  • 1
    Just to mention, this will fail if there are no 10 lines after the last "Constrained". – kvantour Jul 15 '19 at 16:54
  • Yeah a solution that just does what the OP mentioned later in her question (`I should get the lines that start with O and end with Ti.`) would be more robust but I'm assuming the OP would let us know of any rainy day cases to be considered. – Ed Morton Jul 15 '19 at 17:01
  • 1
    Actually, this always brings me to the question. With awk you can do basically everything related to sed,grep,cat,ack,tr,uniq,... . But when should pipelines be recommended? They are often much more readable. – kvantour Jul 15 '19 at 17:02
  • 1
    Its art rather than science but I tend to use pipelines (or individual other commands) when the result is clearer and simpler than an awk script and I don't expect to have to expand on the code in future and it's not something I want to take on in awk simply for the learning experience. If you gain experience using awk for most of the relatively simple (but still appropriate and non-trivial with existing tools) stuff where you don't REALLY need it you'll find it's then much easier to use awk for the complicated stuff where you do. – Ed Morton Jul 15 '19 at 17:17
  • This reads and processes the whole file. (If e. g. the match is in between the last 100 lines in a 10G file, grepping the whole file is significantly slower.) – steffen Jul 16 '19 at 11:25
  • Right - as always you start with a simple solution and if performance turns out to be an issue then you work on solving that problem. – Ed Morton Jul 16 '19 at 12:24
2

One option: reverse the file, find the first match plus 10 lines, re-reverse, take the last 5 lines:

tac filename | grep -B10 Constrained -m 1 | tac | tail -n 5
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
2
tac file | grep "Constrained" -m1 -B10 | tac | tail -n5

tac reverses the file, so you can find the last match easily using grep -m1. Second, you want to process the 10 lines (5 to skip and 5 to print) before ('before', because output is reversed). The second tac reverses the output again so you get the original line order and tail -n5 hides the lines between Constrained and the 6. line to print after the match.

Of course, you can do that with a simple grep, but this will read and process the whole file and can be significantly slower. tac starts reading from the end of file.

grep -A10 "Constrained" file | tail -n5

With awk (also reading the whole file):

awk '/Constrained/{f=NR;b=""};NR>=f+6 && NR<=f+10{b=b ORS $0}END{print b}' file

Searches for Constrained, sets the initial line number (f to current line) and deletes the buffer (for previous results). Then collects lines into b as long as the lines numbers match the region.

steffen
  • 16,138
  • 4
  • 42
  • 81
1

The easiest way I can think of to do that is to read the file twice. The first pass finds the last line number of the match, the second pass prints 6-10 after it.

awk 'FNR==NR && /Constrained/ { line=NR }
     FNR!=NR && FNR >= line+6 && FNR <= line+10' filename filename
Barmar
  • 741,623
  • 53
  • 500
  • 612
1

Reading the file a single time, but keeping track of a buffer:

awk '(c-->0){b[10-c]=$0}
     /Constrained/{c=10}
     END{for(i=6;i<=10;++i) print b[i] }' file

How does this work?

The array b is buffer which always will contain the 10 lines following after a match of the pattern /Constrained/. A counter c will be used to count down to zero. Every time a match of the pattern is found, it is reset to the maximum value of 10. The program works like this:

  1. Read a line (default awk action)
  2. Check if the counter c is bigger than zero and decrease it by 1 (See What is the "-->" operator in C++?). If this condition is met, store the line in the buffer b. Since we start counting from 9 (10-1), store it at position 10 − i. This way the lines after the match are indexed as 1,2,3,...,10.
  3. If the pattern /Constrained/ is matched, reset the counter c to 10.
  4. Go back to 1 unless you are at the end of the file.
  5. If you processed the file, the buffer b now contains the last 10 lines after the match. Just print line 6 till 10.

A couple of cleanups:

It is not necessarily said that after matching the pattern, you have 10 lines, so you have to make sure the previous buffer is fully erased.

$ awk '(c-->0){b[10-c]=$0}
       /Constrained/{c=10; delete b}
       END{for(i=6;i<=10;++i) if (i in b) print b[i] }' file

Parametrised version:

A parameterised version would allow for large ranges. But imagine you want the 10000 till 10001th line after the match. So the buffer would be really big, for just two lines. So we can correct this as:

$ awk '(c-->min) && (c<=max-min){b[max-c]=$0}
       ($0~ere){c=max; delete b}
       END{for(i=min;i<=max;++i) if (i in b) print b[i] }' \
       min=6 max=10 ere="Constrained" file

Be advised that min has to be bigger than 0.

Proof of principle:

$ awk '(c-->0) && (c<=max-min){b[max-c]=$0}
       ($0~ere){ c=max; delete b}
       END{for(i=min;i<=max;i++) if(i in b) print b[i] }' \
       min=6 max=10 ere="20" <( seq 1 50 && seq 101 150 )
126
127
128
129
130
kvantour
  • 25,269
  • 4
  • 47
  • 72
0

I would suggest to try this one. -A refers to 10 lines after the match of the word. -m refers on when to stop reading the file. we dont want to read the whole file. do you?

grep -A10 Constrained file | tail -5
yoga
  • 710
  • 5
  • 11
  • 1
    Yes she does want (need) to read the whole file because otherwise she won't know where the **last** Constrained occurs. What you posted will print the last 5 lines of the **first** Constrained block which is a much simpler problem than the problem the question is about out. – Ed Morton Jul 15 '19 at 16:13
  • gotcha. i didnt red the question correctly. -m1 should not come – yoga Jul 15 '19 at 17:54
  • Now your answer's identical to mine (https://stackoverflow.com/a/57043558/1745001) which I posted before you answered. – Ed Morton Jul 15 '19 at 17:57
  • when i was updating the query, i noticed you already answered it. good for you. – yoga Jul 15 '19 at 17:58
  • I was expecting more of a "oh, I hadn't noticed, I'll delete my duplicate answer" but OK. – Ed Morton Jul 15 '19 at 20:59
  • mine is still different because it saves 2 characters in the command compared to yours – yoga Jul 16 '19 at 13:30
  • YMMV with that as, [per POSIX](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/tail.html), `If neither -c nor -n is specified, -n 10 shall be assumed.`. I understand that pre-POSIX tail didn't have -n or -c options and so most modern tails so far continue to support `-` as if you wrote `-n ` but I wouldn't rely on it since it is undefined behavior at best (and arguably counter-POSIX since POSIX states that without `-n` or `-c` the command should be treated as `-n 10`). – Ed Morton Jul 16 '19 at 13:56
  • exactly. This is still in working state and no plan to remove this option is specified/proposed yet. – yoga Jul 16 '19 at 14:37
  • no plan to remove it? It's already been removed from the standard, there's almost certainly versions of tail out there that already don't support it (a quick google for "unix tail error" didn't find that but did find a reference to its corollary `+` being removed from Linux RHEL5), and even in the versions that do still support it it already isn't supported in combination with other arguments (e.g. with GNU tail `seq 10 | tail -f -n 3` outputs `8 9 10` as expected while `seq 10 | tail -f -3` outputs `tail: option used in invalid context -- 3`). I'm just saying - YMMV. – Ed Morton Jul 16 '19 at 14:56