How to delete all the lines after the last occurence of pattern?

Question

i want to delete all the lines after the last occurence of pattern except the pattern itself

file.txt

honor
apple
redmi
nokia
apple
samsung
lg
htc

file.txt what i want

honor
apple
redmi
nokia
apple

what i have tried

sed -i '/apple/q' file.txt

this deletes all the line after the first occurence of pattern -

honor

Reverse the file, delete everything before the pattern, then reverse the result. — Barmar, Jun 01 '17 at 13:16
file is actually very big .will it have any issue ? isthere any other efficient way ? — j.doe, Jun 01 '17 at 13:19
What should it do if the string isn't found -- print everything or nothing? — Barmar, Jun 01 '17 at 14:37

Ed Morton · Accepted Answer · 2017-06-01T14:36:07.017

7

Simple, robust 2-pass approach using almost no memory:

$ awk 'NR==FNR{if (/apple/) hit=NR; next} {print} FNR==hit{exit}' file file
honor
apple
redmi
nokia
apple

If that doesn't execute fast enough THEN it's time to try some alternatives to see if any produce a performance improvement.

edited Jun 01 '17 at 14:36

answered Jun 01 '17 at 14:04

Ed Morton

188,023
17
78
185

Why does the filename argument need to be passed in twice? Can you explain the mechanics of what awk is doing under the covers in this case? – Jay Taylor Jan 31 '19 at 01:09

score 5 · Answer 2 · answered Jun 01 '17 at 13:19

5

Reverse the file, print everything starting from the first occurrence of the pattern, then reverse the result:

tac file.txt | sed -n '/apple/,$p' | tac > newfile.txt

You can find the line number of the last match, then use that to print the first N lines of the file:

line=$(awk '/apple/ { line=NR } END {print line}')
head -n $line file.txt > newfile.txt

answered Jun 01 '17 at 13:19

Barmar

741,623
53
500
612

If `apple` didn't exist in the file then those scripts would produce no output but I'd think the expected behavior would be to print the whole file. You need to add `file.txt` at the end of the awk script btw. – Ed Morton Jun 01 '17 at 14:35

score 1 · Answer 3 · answered Jun 01 '17 at 13:25

1

If you don't want to reverse the file as Barmar suggests, you will either have to read the file in reverse using lower level tools (eg, fseek) or read it twice:

sed $(awk '/apple/{a=NR}END{print a+1}' input),\$d input

(Note that if the pattern does not appear in the file, this will output nothing. That's an edge case you should worry about.)

answered Jun 01 '17 at 13:25

William Pursell

204,365
48
270
300

Well spotted on the edge case! You could fix that with `print (a?a:NR)+1` of course though I personally prefer to identify the lines to print rather than the lines to delete (feels like more "positive" logic to me) e.g.: `sed -n $(awk '/apple/{a=NR}END{print (a?a:NR)}' input),\$p input` and of course some enclosing quotes would be nice and you could use `head` instead of `sed`... – Ed Morton Jun 01 '17 at 15:02

potong · Answer 4 · 2017-06-02T21:55:29.450

This might work for you (GNU sed):

sed '/apple/,$!b;//!H;//{x;//p;x;h};${x;P};d' file

Print as usual any lines that are not from the first appearance of apple to the end of the file. For lines within the above range, append lines that do not contain the word apple to the hold space (HS). Lines that do contain the word apple, first swap to the HS and print any line there if the word apple is there, then replace the HS with the line containing apple. Delete all lines other than the last line. At the endof file print the first line of the HS and delete the remaining lines.

If slurping a large file is not a problem use:

sed -rz 's/(.*apple[^\n]*).*/\1\n/' file

This uses greed to capture all lines before and including the word apple.

score 0 · Answer 5 · answered Jun 01 '17 at 13:54

0

here is another awk without scanning the file twice

$ awk 'f       {buf=buf ORS $0} 
       /apple/ {f=1; if(buf)print buf; buf=$0} 
       !f' file

honor
apple
redmi
nokia
apple

answered Jun 01 '17 at 13:54

karakfa

66,216
7
41
56

That would produce unexpected output if `apple` appeared on the line immediately before the last `apple` line in the input (it would be printed twice). The OP also mentioned in a comment that he's concerned about his file being big so YMMV with storing blocks of it in memory as it might work today and then fail later for some other distribution of "apple"s. – Ed Morton Jun 01 '17 at 14:23

dawg · Answer 6 · 2017-06-01T15:35:03.943

0

If you don't mind having everything in memory, you can do:

$ awk '/^apple$/{last=NR} 
              {lines[NR]=$0}
     END{for(li=1;li<=last;li++) print lines[li]}' file
honor
apple
redmi
nokia
apple

edited Jun 01 '17 at 15:35

answered Jun 01 '17 at 15:29

dawg

98,345
23
131
206

Thor · Answer 7 · 2017-06-01T19:35:49.477

0

Given that you are dealing with large input I would go with a two-pass coreutils solution, e.g.:

n=$(grep -Fn apple infile | tail -n1 | cut -d: -f1)
[ -n "$n" ] && head -n$n infile > outfile

This uses grep's fixed string matching (-F) to find every line containing apples. Then head is used to extract the relevant lines.

You did not specify what happens when no apples are found, so this solution does nothing when that occurs.

edited Jun 01 '17 at 19:35

answered Jun 01 '17 at 15:40

Thor

45,082
11
119
130

You missed a closing ')' (I can't edit your answer by myself) – niglesias Jun 01 '17 at 17:21

How to delete all the lines after the last occurence of pattern?

7 Answers7

Linked