How to get the part of a file after the first line that matches a regular expression

Question

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.

That is:

cat file | grep 'TERMINATE'     # It is found on line 534

So, I want the file from line 535 to line 1000 for further processing.

How can I do that?

I know that, its like I use it that way. Lets come back to the question. — Yugal Jindle, Aug 18 '11 at 07:03
This is a perfectly fine programming question, and well suited for stackoverflow. — aioobe, Aug 18 '11 at 07:06
@Jacob It's not useless use of cat at all. Its use is to print a file to standard output, which means we can use `grep`s standard input interface to read data in, rather than having to learn what switch to apply to `grep`, and `sed`, and `awk`, and `pandoc`, and `ffmpeg` etc. when we want to read from a file. It saves time because we don't have to learn a new switch every time we want to do the same thing: read from a file. — runeks, Aug 13 '16 at 06:56
@runeks I agree with your sentiment - but you can achieve that without cat: `grep 'TERMINATE' < file`. Maybe it does make the reading a bit harder - but this is shell scripting, so that's always going to be a problem :) — LOAS, Aug 30 '17 at 07:23
See [this answer from Ed Morton](https://stackoverflow.com/a/17914105/8344060) — kvantour, Jul 22 '19 at 12:35

score 357 · Accepted Answer · edited Oct 29 '21 at 16:01

The following will print the line matching TERMINATE till the end of the file:

sed -n -e '/TERMINATE/,$p'

Explained: -n disables default behavior of sed of printing each line after executing its script on it, -e indicated a script to sed, /TERMINATE/,$ is an address (line) range selection meaning the first line matching the TERMINATE regular expression (like grep) to the end of the file ($), and p is the print command which prints the current line.

This will print from the line that follows the line matching TERMINATE till the end of the file: (from AFTER the matching line to EOF, NOT including the matching line)

sed -e '1,/TERMINATE/d'

Explained: 1,/TERMINATE/ is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE regular expression, and d is the delete command which delete the current line and skip to the next line. As sed default behavior is to print the lines, it will print the lines after TERMINATE to the end of input.

If you want the lines before TERMINATE:

sed -e '/TERMINATE/,$d'

And if you want both lines before and after TERMINATE in two different files in a single pass:

sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file

The before and after files will contain the line with terminate, so to process each you need to use:

head -n -1 before
tail -n +2 after

IF you do not want to hard code the filenames in the sed script, you can:

before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file

But then you have to escape the $ meaning the last line so the shell will not try to expand the $w variable (note that we now use double quotes around the script instead of single quotes).

I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.

How would you replace the hardcoded TERMINATE by a variable?

You would make a variable for the matching text and then do it the same way as the previous example:

matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file

to use a variable for the matching text with the previous examples:

## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"

## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"

## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"

The important points about replacing text with variables in these cases are:

Variables ($variablename) enclosed in single quotes ['] won't "expand" but variables inside double quotes ["] will. So, you have to change all the single quotes to double quotes if they contain text you want to replace with a variable.
The sed ranges also contain a $ and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $ characters with a backslash [\] like: \$p, \$d, \$w.

How can we get the lines before TERMINATE and delete all that follows ? — Yugal Jindle, Aug 18 '11 at 08:43
How would your replace the hardcoded TERMINAL by a variable? — Sébastien Clément, Feb 09 '16 at 20:47
One use case that's missing here is how to print lines after the last marker (if there can be multiple of them in the file .. think log files etc). — mato, Nov 23 '16 at 14:52
The example `sed -e "1,/$matchtext/d"` does not work when `$matchtext` occurs in the first line. I had to change it to `sed -e "0,/$matchtext/d"`. — Karalga, Jan 27 '17 at 17:15
One stop shop for my problem. Prefer to _double upvote_ this answer, but I can't. — Pavan Kumar, Aug 03 '21 at 10:26
@Karalga had the same issue, except `sed -e "0,/$matchtext/d"` still displays `$matchtext` for me, so I did this: `sed -e "0,/$matchtext/d" | tail -n +2`. But `sed -e '1i\\n' | sed -e "1,/$matchtext/d"` should work universally. — Mxt, Apr 29 '22 at 20:21

score 77 · Answer 2 · edited Oct 29 '21 at 15:54

77

As a simple approximation you could use

grep -A100000 TERMINATE file

which greps for TERMINATE and outputs up to 100,000 lines following that line.

From the man page:

-A NUM, --after-context=NUM

Print NUM lines of trailing context after matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.

edited Oct 29 '21 at 15:54

Peter Mortensen

30,738
21
105
131

answered Aug 18 '11 at 07:06

aioobe

413,195
112
811
826

1

That might work for this, but I need to code it into my script to process many files. So, show some generic solution. – Yugal Jindle Aug 18 '11 at 07:14
4

I think this is one practical solution! – michelgotta Apr 29 '13 at 15:48
2

similarly -B NUM, --before-context=NUM Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given. – PiyusG Apr 29 '14 at 11:43
this solution worked for me because i can easily use variables as my string to check for. – Jose Martinez Mar 09 '16 at 17:15
6

Nice idea! If you are uncertain about the size of the context you may count the lines of `file` instead: `grep -A$(cat file | wc -l) TERMINATE file` – Lemming Aug 07 '17 at 11:56
I need something that limits characters, not lines. – Timothy Swan Nov 27 '17 at 19:43
If you want the exact rest line in your file after the pattern TERMINATE, you can une this : `grep -A$(($(cat file | wc -l)-$(grep -n TERMINATE file | awk -F":" '{print $1}'))) TERMINATE file` – Ahmed Jul 06 '18 at 18:43
@Ahmed, how is that better than `grep -A$(wc -l < file) TERMINATE file`? – aioobe Jul 06 '18 at 20:28
@aioobe because it returns only the lines that remain for the end of file $(($(cat file | wc -l)-$(grep -n TERMINATE file | awk -F":" '{print $1}'))) – Ahmed Jul 07 '18 at 15:01
@Ahmed, but so does `grep -A$(wc -l < file) TERMINATE file`, right? – aioobe Jul 08 '18 at 12:50
Using `wc -l` to make sure you don't accidentally truncate lines is nice, but you just need `NUM > lines remaining` not `NUM == lines remaining`. The calculation of the "exact" number of lines remaining is going to read the file many more times than is necessary and is more complicated than the `sed` or `awk` solutions (the main advantage of `grep` is it's the easiest to remember). – szmoore Mar 13 '19 at 02:05
unfortunately grep doesn't support INFINITE as NUM for -A and -B option :( then we must add very big numbers, but we don't know what is maximum int for them. – Znik Nov 18 '20 at 11:16

score 29 · Answer 3 · edited Oct 29 '21 at 16:08

29

A tool to use here is AWK:

cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1}  {if (found) print }'

How does this work:

We set the variable 'found' to zero, evaluating false
if a match for 'TERMINATE' is found with the regular expression, we set it to one.
If our 'found' variable evaluates to True, print :)

The other solutions might consume a lot of memory if you use them on very large files.

edited Oct 29 '21 at 16:08

Peter Mortensen

30,738
21
105
131

answered Apr 18 '13 at 16:19

Jos De Graeve

400
3
4

Simple, elegant and very generic. In my case it was printing everything until second occurrence of '###': `cat file | awk 'BEGIN{ found=0} /###/{found=found+1} {if (found<2) print }'` – Aleksander Stelmaczonek Aug 16 '17 at 12:25
6

A tool *not* to use here is `cat`. `awk` is perfectly capable of taking one or more filenames as arguments. See also https://stackoverflow.com/questions/11710552/useless-use-of-cat – tripleee Aug 03 '18 at 05:18

score 12 · Answer 4 · edited Oct 29 '21 at 16:14

If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. AWK can do this in a simple way:

awk '{if(found) print} /TERMINATE/{found=1}' your_file

Explanation:

Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
After the printing is done, we check if this is the starter-line (that should not be included).

This will print all lines after the TERMINATE-line.

Generalization:

You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
start- and end-lines could be defined by a regular expression matching the line.

Example:

$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$

Explanation:

If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
Print the current line if found is set.
If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.

Notes:

The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a BEGIN{found=0} to the start of the AWK expression.
If multiple start-end-blocks are found, they are all printed.

Awesome Awesome example. Just spent 2 hours looking at csplit, sed, and all manner of over complicated awk commands. Not only did this do what I wanted but shown simple enough to infer how to modify it to do a few other related things I needed. Makes me remember awk is great and not just in indecipherable mess of crap. Thanks. — user1169420, Feb 19 '19 at 01:46
`{if(found) print}` is a bit of an anti-pattern in awk, it's more idiomatic to replace the block with just `found` or `found;` if you need another filter afterwards. — user000001, Apr 17 '19 at 13:28
@user000001 please explain. I do not understand what to replace and how. Anyway I think the way its written makes it very clear what is going on. — UlfR, Apr 17 '19 at 13:43
You would replace `awk '{if(found) print} /TERMINATE/{found=1}' your_file` with `awk 'found; /TERMINATE/{found=1}' your_file`, they should both do the same thing. — user000001, Apr 17 '19 at 13:45
or you can condense it much further down to ::::::::::::::::::::::::::::::::: :::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::: :::: `jot 23 | mawk '_ || /17/ && _++'` :::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::: :::::::::::::::::::::::::::::: `18 19 20 21 22 23` — RARE Kpop Manifesto, Feb 28 '23 at 16:01

score 7 · Answer 5 · edited Oct 29 '21 at 17:35

7

grep -A 10000000 'TERMINATE' file

is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.

edited Oct 29 '21 at 17:35

Peter Mortensen

30,738
21
105
131

answered Nov 08 '17 at 22:59

user8910163

71
1
1

What do you mean by *"handle about anything you hit"* (seems incomprehensible)? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/47191172/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Oct 29 '21 at 17:36

score 5 · Answer 6 · edited Oct 29 '21 at 15:53

5

Use Bash parameter expansion like the following:

content=$(cat file)
echo "${content#*TERMINATE}"

edited Oct 29 '21 at 15:53

Peter Mortensen

30,738
21
105
131

answered Aug 18 '11 at 07:04

Mu Qiao

6,941
1
31
34

Can you explain what are you doing ? – Yugal Jindle Aug 18 '11 at 07:13
I copied the content of "file" into the $content variable. Then I removed all the characters until "TERMINATE" was seen. It didn't use greedy matching, but you can use greedy matching by ${content##*TERMINATE}. – Mu Qiao Aug 18 '11 at 07:16
here is the link of the bash manual: http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion – Mu Qiao Aug 18 '11 at 07:17
7

what will happen if file is 100GB size ? – Znik Dec 22 '14 at 15:00
1

Downvote: This is horrible (reading the file into a variable) and wrong (using the variable without quoting it; and you should properly use `printf` or make sure you know exactly what you are passing to `echo`.). – tripleee Jul 25 '16 at 12:34

score 4 · Answer 7 · answered Jul 31 '14 at 10:40

4

There are many ways to do it with sed or awk:

sed -n '/TERMINATE/,$p' file

This looks for TERMINATE in your file and prints from that line up to the end of the file.

awk '/TERMINATE/,0' file

This is exactly the same behaviour as sed.

In case you know the number of the line from which you want to start printing, you can specify it together with NR (number of record, which eventually indicates the number of the line):

awk 'NR>=535' file

Example

$ seq 10 > a        #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10

answered Jul 31 '14 at 10:40

fedorqui

275,237
103
548
598

For the number your can also use `more +7 file` – 123 Jun 03 '15 at 14:41
This includes the matching line, which is not what is wanted in this question. – mivk Jul 23 '16 at 16:51
@mivk well, this is also the case of the accepted answer and the 2nd most upvoted, so the problem may be with a misleading title. – fedorqui Jul 23 '16 at 21:36

score 3 · Answer 8 · edited Oct 29 '21 at 16:05

If for any reason, you want to avoid using sed, the following will print the line matching TERMINATE till the end of the file:

tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file

And the following will print from the following line matching TERMINATE till the end of the file:

tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file

It takes two processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file doesn’t not contain TERMINATE, the first command fails.

file is scanned twice. what if it is 100GB size? – Znik Dec 22 '14 at 15:01 — Znik, Dec 22 '14 at 15:01

score 0 · Answer 9 · edited Oct 29 '21 at 16:26

0

Alternatives to the excellent sed answer by jfg956, and which don't include the matching line:

awk '/TERMINATE/ {y=1;next} y' (Hai Vu's answer to 'grep +A': print everything after a match)
awk '/TERMINATE/ ? c++ : c' (Steven Penny's answer to 'grep +A': print everything after a match)
perl -ne 'print unless 1 .. /TERMINATE/' (tchrist's answer to 'grep +A': print everything after a match)

edited Oct 29 '21 at 16:26

Peter Mortensen

30,738
21
105
131

answered Jul 23 '16 at 17:02

mivk

13,452
5
76
69

score 0 · Answer 10 · edited Oct 29 '21 at 17:33

0

This could be one way of doing it. If you know in what line of the file you have your grep word and how many lines you have in your file:

grep -A466 'TERMINATE' file

edited Oct 29 '21 at 17:33

Peter Mortensen

30,738
21
105
131

answered Jan 25 '17 at 00:41

Mariah

141
2
9

2

If the line number is known, then `grep` isn't even required; you can just use `tail -n $NUM`, so this isn't really an answer. – Samveen May 22 '17 at 07:04

score 0 · Answer 11 · answered Mar 14 '23 at 13:12

In my bash command I am looking for some mark lines in text file log.txt . My mark is #mark1678793202693 this mark is contains 2 times in text file. I want always print block between two same marks.

$a is contains all number of lines in text file log.txt

$aro is conversion variable $a to array

$s is start line - first found pattern in line

$e is end line - second pattern in log.txt

Now you can use sed and print line from lineNumber to lineNumber

a=$(awk '/#mark1678793202693/{print NR}' log.txt) ; aro=($a),s=${aro[0]};e=${aro[1]} ; sed -n -e "${s}","${e}p" log.txt

score -1 · Answer 12 · edited Oct 29 '21 at 15:58

-1

sed is a much better tool for the job:

sed -n '/re/,$p' file

where re is a regular expression.

Another option is grep's --after-context flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with -n and your match expression.

edited Oct 29 '21 at 15:58

Peter Mortensen

30,738
21
105
131

answered Aug 18 '11 at 07:09

ckwang

39
2

--after-context is fine but not in all cases. – Yugal Jindle Aug 18 '11 at 07:14
Can you suggest something else.. ?? – Yugal Jindle Aug 18 '11 at 07:15

score -2 · Answer 13 · edited Oct 29 '21 at 16:15

-2

This will print all lines from the last found line "TERMINATE" till the end of the file:

LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG | tail -n 1 | sed "s/:/ \\'/g" | awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME

edited Oct 29 '21 at 16:15

Peter Mortensen

30,738
21
105
131

answered Feb 13 '16 at 21:52

easyyu

109
6

Extracting a line number with `grep` so you can feed it to `tail` is a wasteful antipattern. Finding the match and printing up through the end of the file (or, conversely, printing and stopping at the first match) is eminently done with the normal, essential regex tools themselves. The massive `grep | tail | sed | awk` is also in and of itself a massive [useless use of `grep` and friends](http://www.iki.fi/era/unix/award.html#grep). – tripleee Feb 17 '16 at 15:03
I think s*he was trying to give us something that would find the /last instance/ of 'TERMINATE' and give the lines from that instance on. Other implementations give you the first instance onward. The LINE_NUMBER should probably look like this, instead: LINE_NUMBER=$(grep -o -n 'TERMINATE' $OSCAM_LOG | tail -n 1| awk -F: '{print $1}') Maybe not the most elegant way, but it seems to get the job done. ^.^ – fbicknel Jul 01 '16 at 21:09
... or all in one line, but ugly: tail -n +$(grep -o -n 'TERMINATE' $YOUR_FILE_NAME | tail -n 1| awk -F: '{print $1}') $YOUR_FILE_NAME – fbicknel Jul 01 '16 at 21:17
.... and I was going to go back and edit out $OSCAM_LOG in lieu of $YOUR_FILE_NAME... but can't for some reason. No idea where $OSCAM_LOG came from; I just mindlessly parroted it. o.O – fbicknel Jul 01 '16 at 21:19
Doing this in Awk alone is a common task in Awk 101. If you are already using a more capable tool just to get the line number, let go of `tail` and do the task in the more capable tool altogether. Anyway, the title clearly says "first match". – tripleee Jul 25 '16 at 12:30

How to get the part of a file after the first line that matches a regular expression

13 Answers13

Example

Linked

Related