How to print line 6 through 10 after last match?

Question

After matching the last "Constrained" I want to print the 6th to 10th line:

This is what I've tried:

awk '/Constrained/ { print ; for(n=6; n<10; n++) { getline ; print } }' filename

But doesn't work. I was thinking of using tail -5 to get only the last 5 lines. (LInes 6 to 10 of only the last match)

You can test it with this:

************************** Constrained Symmetrised Forces **************************
 *                                                                                  *
 *                           Cartesian components (eV/A)                            *
 * -------------------------------------------------------------------------------- *
 *                         x                    y                    z              *
 *                                                                                  *
 * O               1     -0.03440             -0.03440              0.00000         *
 * O               2      0.03440              0.03440             -0.00000         *
 * O               3     -0.03440              0.03440             -0.00000         *
 * O               4      0.03440             -0.03440              0.00000         *
 * Ti              1      0.00000              0.00000              0.00000         *

I should get the lines that start with O and end with Ti. But throughout the file there are several "constrained"

Use `sed -n '6,10p' filename`, to print between line 6 and 10 included — Zelnes, Jul 15 '19 at 15:17
@Zelnes That prints lines 6 to 10 of the file, not after the match. — Barmar, Jul 15 '19 at 15:18
Do you only want to match the last `Constrained` in the file, or did you mean "after each match"? — Barmar, Jul 15 '19 at 15:23
That's significantly harder, because you don't know whether it's the last match until you get to the end of the file. — Barmar, Jul 15 '19 at 15:25
that's why I thought of getting all the matches, then using tail — Caterina, Jul 15 '19 at 15:26

score 3 · Accepted Answer · answered Jul 15 '19 at 16:09

3

All you need is:

grep -A10 Constrained file | tail -n 5

answered Jul 15 '19 at 16:09

Ed Morton

188,023
17
78
185

3

Ed you are not fun at all ;-) Brilliant to see how most of us are grepping tacing awking while it is as easy as it can be ;-) – kvantour Jul 15 '19 at 16:52
1

Just to mention, this will fail if there are no 10 lines after the last "Constrained". – kvantour Jul 15 '19 at 16:54
Yeah a solution that just does what the OP mentioned later in her question (`I should get the lines that start with O and end with Ti.`) would be more robust but I'm assuming the OP would let us know of any rainy day cases to be considered. – Ed Morton Jul 15 '19 at 17:01
1

Actually, this always brings me to the question. With awk you can do basically everything related to sed,grep,cat,ack,tr,uniq,... . But when should pipelines be recommended? They are often much more readable. – kvantour Jul 15 '19 at 17:02
1

Its art rather than science but I tend to use pipelines (or individual other commands) when the result is clearer and simpler than an awk script and I don't expect to have to expand on the code in future and it's not something I want to take on in awk simply for the learning experience. If you gain experience using awk for most of the relatively simple (but still appropriate and non-trivial with existing tools) stuff where you don't REALLY need it you'll find it's then much easier to use awk for the complicated stuff where you do. – Ed Morton Jul 15 '19 at 17:17
This reads and processes the whole file. (If e. g. the match is in between the last 100 lines in a 10G file, grepping the whole file is significantly slower.) – steffen Jul 16 '19 at 11:25
Right - as always you start with a simple solution and if performance turns out to be an issue then you work on solving that problem. – Ed Morton Jul 16 '19 at 12:24

Chris_Rands · Answer 2 · 2019-07-15T15:37:14.747

2

One option: reverse the file, find the first match plus 10 lines, re-reverse, take the last 5 lines:

tac filename | grep -B10 Constrained -m 1 | tac | tail -n 5

edited Jul 15 '19 at 15:37

answered Jul 15 '19 at 15:29

Chris_Rands

38,994
14
83
119

what does the -m1 flag do? – Caterina Jul 15 '19 at 15:39
@Caterina type `man grep` in your terminal or google `grep man page`. – Ed Morton Jul 15 '19 at 16:04

steffen · Answer 3 · 2019-07-16T15:55:25.840

2

tac file | grep "Constrained" -m1 -B10 | tac | tail -n5

tac reverses the file, so you can find the last match easily using grep -m1. Second, you want to process the 10 lines (5 to skip and 5 to print) before ('before', because output is reversed). The second tac reverses the output again so you get the original line order and tail -n5 hides the lines between Constrained and the 6. line to print after the match.

Of course, you can do that with a simple grep, but this will read and process the whole file and can be significantly slower. tac starts reading from the end of file.

grep -A10 "Constrained" file | tail -n5

With awk (also reading the whole file):

awk '/Constrained/{f=NR;b=""};NR>=f+6 && NR<=f+10{b=b ORS $0}END{print b}' file

Searches for Constrained, sets the initial line number (f to current line) and deletes the buffer (for previous results). Then collects lines into b as long as the lines numbers match the region.

edited Jul 16 '19 at 15:55

answered Jul 15 '19 at 15:32

steffen

16,138
4
42
81

2

@Caterina naah, same answer as the accepted one, but with more explanations (took 3 minutes to write) :-P I should stop writing explanations. – steffen Jul 15 '19 at 15:48
1

no you should not ;-) explanations are more useful. – kvantour Jul 15 '19 at 15:52
1

This answer is almost correct but I'm not getting the last line with "Ti" – Caterina Jul 15 '19 at 16:10
1

@Caterina See update. I also added something to consider (`grep | tail` vs. `tac | grep | tac | tail`). – steffen Jul 16 '19 at 11:23
1

Just in case, I upvoted your answer :) But I chose the other one because it was simpler – Caterina Jul 16 '19 at 13:47

Barmar · Answer 4 · 2019-07-15T15:29:19.817

1

The easiest way I can think of to do that is to read the file twice. The first pass finds the last line number of the match, the second pass prints 6-10 after it.

awk 'FNR==NR && /Constrained/ { line=NR }
     FNR!=NR && FNR >= line+6 && FNR <= line+10' filename filename

edited Jul 15 '19 at 15:29

answered Jul 15 '19 at 15:19

Barmar

741,623
53
500
612

I guess OP wants to match last `Constrained` – anubhava Jul 15 '19 at 15:21
@anubhava Possible. "last match" could mean the last match in the file, or it could mean the most recent match. – Barmar Jul 15 '19 at 15:24
I meant the last match in the file – Caterina Jul 15 '19 at 15:25
This is printing the match too. I don't really want that line – Caterina Jul 15 '19 at 15:27
Then why do you have `print;` before the `for` loop? – Barmar Jul 15 '19 at 15:29
I removed that from the answer. – Barmar Jul 15 '19 at 15:31
@kvantour Won't that make the condition match during the first pass? – Barmar Jul 15 '19 at 17:25
@kvantour I just tested it. If I remove that, it prints the lines after all matches of `Constrained`, not just the last match in the file. – Barmar Jul 15 '19 at 17:27
If `Constrained` is on lines 2 and 14, it prints lines 8-12 and 20-24. It also prints 20-24 twice, once during the first pass and again during the second pass. – Barmar Jul 15 '19 at 17:28
@Barmar you are correct. I overlooked that condition. `awk '(FNR==NR) && /Constrained/{ line=NR}(FNR==NR){next}(FNR >= line+6 && FNR <= line+10)'` but it is less clean. – kvantour Jul 15 '19 at 18:27

kvantour · Answer 5 · 2019-07-15T16:49:58.233

Reading the file a single time, but keeping track of a buffer:

awk '(c-->0){b[10-c]=$0}
     /Constrained/{c=10}
     END{for(i=6;i<=10;++i) print b[i] }' file

How does this work?

The array b is buffer which always will contain the 10 lines following after a match of the pattern /Constrained/. A counter c will be used to count down to zero. Every time a match of the pattern is found, it is reset to the maximum value of 10. The program works like this:

Read a line (default awk action)
Check if the counter c is bigger than zero and decrease it by 1 (See What is the "-->" operator in C++?). If this condition is met, store the line in the buffer b. Since we start counting from 9 (10-1), store it at position 10 − i. This way the lines after the match are indexed as 1,2,3,...,10.
If the pattern /Constrained/ is matched, reset the counter c to 10.
Go back to 1 unless you are at the end of the file.
If you processed the file, the buffer b now contains the last 10 lines after the match. Just print line 6 till 10.

A couple of cleanups:

It is not necessarily said that after matching the pattern, you have 10 lines, so you have to make sure the previous buffer is fully erased.

$ awk '(c-->0){b[10-c]=$0}
       /Constrained/{c=10; delete b}
       END{for(i=6;i<=10;++i) if (i in b) print b[i] }' file

Parametrised version:

A parameterised version would allow for large ranges. But imagine you want the 10000 till 10001th line after the match. So the buffer would be really big, for just two lines. So we can correct this as:

$ awk '(c-->min) && (c<=max-min){b[max-c]=$0}
       ($0~ere){c=max; delete b}
       END{for(i=min;i<=max;++i) if (i in b) print b[i] }' \
       min=6 max=10 ere="Constrained" file

Be advised that min has to be bigger than 0.

Proof of principle:

$ awk '(c-->0) && (c<=max-min){b[max-c]=$0}
       ($0~ere){ c=max; delete b}
       END{for(i=min;i<=max;i++) if(i in b) print b[i] }' \
       min=6 max=10 ere="20" <( seq 1 50 && seq 101 150 )
126
127
128
129
130

yoga · Answer 6 · 2019-07-15T17:55:00.973

0

I would suggest to try this one. -A refers to 10 lines after the match of the word. -m refers on when to stop reading the file. we dont want to read the whole file. do you?

grep -A10 Constrained file | tail -5

edited Jul 15 '19 at 17:55

answered Jul 15 '19 at 16:12

yoga

710
5
11

1

Yes she does want (need) to read the whole file because otherwise she won't know where the **last** Constrained occurs. What you posted will print the last 5 lines of the **first** Constrained block which is a much simpler problem than the problem the question is about out. – Ed Morton Jul 15 '19 at 16:13
gotcha. i didnt red the question correctly. -m1 should not come – yoga Jul 15 '19 at 17:54
Now your answer's identical to mine (https://stackoverflow.com/a/57043558/1745001) which I posted before you answered. – Ed Morton Jul 15 '19 at 17:57
when i was updating the query, i noticed you already answered it. good for you. – yoga Jul 15 '19 at 17:58
I was expecting more of a "oh, I hadn't noticed, I'll delete my duplicate answer" but OK. – Ed Morton Jul 15 '19 at 20:59
mine is still different because it saves 2 characters in the command compared to yours – yoga Jul 16 '19 at 13:30
YMMV with that as, [per POSIX](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/tail.html), `If neither -c nor -n is specified, -n 10 shall be assumed.`. I understand that pre-POSIX tail didn't have -n or -c options and so most modern tails so far continue to support `-` as if you wrote `-n ` but I wouldn't rely on it since it is undefined behavior at best (and arguably counter-POSIX since POSIX states that without `-n` or `-c` the command should be treated as `-n 10`). – Ed Morton Jul 16 '19 at 13:56
exactly. This is still in working state and no plan to remove this option is specified/proposed yet. – yoga Jul 16 '19 at 14:37
no plan to remove it? It's already been removed from the standard, there's almost certainly versions of tail out there that already don't support it (a quick google for "unix tail error" didn't find that but did find a reference to its corollary `+` being removed from Linux RHEL5), and even in the versions that do still support it it already isn't supported in combination with other arguments (e.g. with GNU tail `seq 10 | tail -f -n 3` outputs `8 9 10` as expected while `seq 10 | tail -f -3` outputs `tail: option used in invalid context -- 3`). I'm just saying - YMMV. – Ed Morton Jul 16 '19 at 14:56

How to print line 6 through 10 after last match?

6 Answers6