5

i want to delete all the lines after the last occurence of pattern except the pattern itself

file.txt

honor
apple
redmi
nokia
apple
samsung
lg
htc

file.txt what i want

honor
apple
redmi
nokia
apple

what i have tried

sed -i '/apple/q' file.txt

this deletes all the line after the first occurence of pattern -

honor
Thor
  • 45,082
  • 11
  • 119
  • 130
j.doe
  • 65
  • 4

7 Answers7

7

Simple, robust 2-pass approach using almost no memory:

$ awk 'NR==FNR{if (/apple/) hit=NR; next} {print} FNR==hit{exit}' file file
honor
apple
redmi
nokia
apple

If that doesn't execute fast enough THEN it's time to try some alternatives to see if any produce a performance improvement.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Why does the filename argument need to be passed in twice? Can you explain the mechanics of what awk is doing under the covers in this case? – Jay Taylor Jan 31 '19 at 01:09
5

Reverse the file, print everything starting from the first occurrence of the pattern, then reverse the result:

tac file.txt | sed -n '/apple/,$p' | tac > newfile.txt

You can find the line number of the last match, then use that to print the first N lines of the file:

line=$(awk '/apple/ { line=NR } END {print line}')
head -n $line file.txt > newfile.txt
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • If `apple` didn't exist in the file then those scripts would produce no output but I'd think the expected behavior would be to print the whole file. You need to add `file.txt` at the end of the awk script btw. – Ed Morton Jun 01 '17 at 14:35
1

If you don't want to reverse the file as Barmar suggests, you will either have to read the file in reverse using lower level tools (eg, fseek) or read it twice:

sed $(awk '/apple/{a=NR}END{print a+1}' input),\$d input

(Note that if the pattern does not appear in the file, this will output nothing. That's an edge case you should worry about.)

William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • Well spotted on the edge case! You could fix that with `print (a?a:NR)+1` of course though I personally prefer to identify the lines to print rather than the lines to delete (feels like more "positive" logic to me) e.g.: `sed -n $(awk '/apple/{a=NR}END{print (a?a:NR)}' input),\$p input` and of course some enclosing quotes would be nice and you could use `head` instead of `sed`... – Ed Morton Jun 01 '17 at 15:02
1

This might work for you (GNU sed):

sed '/apple/,$!b;//!H;//{x;//p;x;h};${x;P};d' file

Print as usual any lines that are not from the first appearance of apple to the end of the file. For lines within the above range, append lines that do not contain the word apple to the hold space (HS). Lines that do contain the word apple, first swap to the HS and print any line there if the word apple is there, then replace the HS with the line containing apple. Delete all lines other than the last line. At the endof file print the first line of the HS and delete the remaining lines.

If slurping a large file is not a problem use:

sed -rz 's/(.*apple[^\n]*).*/\1\n/' file

This uses greed to capture all lines before and including the word apple.

potong
  • 55,640
  • 6
  • 51
  • 83
0

here is another awk without scanning the file twice

$ awk 'f       {buf=buf ORS $0} 
       /apple/ {f=1; if(buf)print buf; buf=$0} 
       !f' file

honor
apple
redmi
nokia
apple
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • That would produce unexpected output if `apple` appeared on the line immediately before the last `apple` line in the input (it would be printed twice). The OP also mentioned in a comment that he's concerned about his file being big so YMMV with storing blocks of it in memory as it might work today and then fail later for some other distribution of "apple"s. – Ed Morton Jun 01 '17 at 14:23
0

If you don't mind having everything in memory, you can do:

$ awk '/^apple$/{last=NR} 
              {lines[NR]=$0}
     END{for(li=1;li<=last;li++) print lines[li]}' file
honor
apple
redmi
nokia
apple
dawg
  • 98,345
  • 23
  • 131
  • 206
0

Given that you are dealing with large input I would go with a two-pass coreutils solution, e.g.:

n=$(grep -Fn apple infile | tail -n1 | cut -d: -f1)
[ -n "$n" ] && head -n$n infile > outfile

This uses grep's fixed string matching (-F) to find every line containing apples. Then head is used to extract the relevant lines.

You did not specify what happens when no apples are found, so this solution does nothing when that occurs.

Thor
  • 45,082
  • 11
  • 119
  • 130