1

I have a file containing the following lines:

aaa
bbb
ccc
pattern
eee
fff
ggg
pattern
hhh

I would like to delete 2 lines before the last matching pattern in the file. The expected output is:

aaa
bbb
ccc
pattern
eee
pattern
hhh

I tried - sed -i '/pattern/{N;N;d;}' file but it didn't work. There was no change to the file.

I also tried - tac file | sed '/pattern/,+2 d' | tac > tmpfile && mv tmpfile file but this also deleted the line containing the matching pattern.

My sed version is sed (GNU sed) 4.7.

Any help would be much appreciated. Thanks.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
Ira
  • 547
  • 4
  • 13
  • If your first command didn't change the file, it means the pattern didn't match. Had it matched it would have deleted 3 lines (none of them being the ones you wanted deleted). – jhnc Jun 10 '23 at 05:23
  • how large is the file? – jhnc Jun 10 '23 at 05:30
  • I tried adding 4 lines to the file `temp.txt` - ```aaa bbb ccc ddd``` on each line and then ran `sed -i '/ccc/{N;N;d;}' temp.txt`. The file was as is after running the command. The file is around 350K. It contains 6500 lines. – Ira Jun 10 '23 at 05:33

8 Answers8

3

sed is the wrong tool for this. Any time you want to edit a file, especially if you want to look backwards after some matching bit, ed is almost always a better option, as it's designed to work with files, not a stream of lines always moving forward.

ed -s file.txt <<'EOF'
?pattern?-2;+1 d
w
EOF

or if a heredoc isn't convenient

printf '%s\n' '?pattern?-2;+1 d' w | ed -s file.txt

will first set the current line to the one two before the last one matching pattern, and then delete that line and the one following (So the two lines preceding that last match of pattern), and finally write the modified file back out.

Shawn
  • 47,241
  • 3
  • 26
  • 60
1

Edit: HatLess's sed solution looks much better to me.

I agree with the Shawn's answer; sed is not the best tool for the job. But here's a solution with sed:

Have a script.sed file:

# read the full file into the pattern space
:1
$! { N ; b1 }

# replace last occurrence of "2 lines plus pattern line"
# with just the pattern line
s/(.*\n.*\n)(pattern\n?.*)\'/\2/m

Run it like this:

sed -E -f script.sed file.txt

Or in a single line like this:

sed -E ':1 ; $! { N ; b1 } ; s/(.*\n.*\n)(pattern\n?.*)\'\''/\2/m' text

The basic idea is that, because we need to work on the latest pattern in the file, we need to read the entire file before modifying it.

The first two lines are a loop using sed's goto-like commands:

  • :1 creates a label called 1.
  • $! makes sure that we run the following commands for every line except the last one.
    • N reads the next line.
    • b1 jumps to the label 1.

The following substitution command will only run on the last line. Note the following:

  • We don't need to escape the capturing group parentheses (\( and \)) because we call sed with the -E flag which turns on the Extended Regular Expression syntax.
  • We pass the flag m to the substitute command, which makes the regex work in multiline mode. In our case, this provides the following characteristics:
    • The dot (.) no longer matches newline characters (\n). This is useful in our case because we want to be explicit about the number of lines we match.
    • It enables the special \' character (a sed-only feature), which matches the end of the buffer. We need this to anchor our regex to the end of the file.
  • Also note the \n? after pattern. Because sed reads lines without the trailing new line, this is a way to match a "pattern" that might be either the last line or a line in the middle of the file.
1

Using GNU sed

$ sed -Ezi.bak 's/(.*\n)([^\n]*\n){2}(pattern)/\1\3/' input_file
aaa
bbb
ccc
pattern
eee
pattern
hhh
HatLess
  • 10,622
  • 5
  • 14
  • 32
0

I would harness GNU AWK for this task following way, let file.txt content be

aaa
bbb
ccc
pattern
eee
fff
ggg
pattern
hhh

then

awk '{arr[NR]=$0}/pattern/{ln=NR}END{for(i=1;i<=NR;i+=1){if(i+2!=ln&&i+1!=ln){print arr[i]}}}' file.txt

gives output

aaa
bbb
ccc
pattern
eee
pattern
hhh

Explanation: I store lines of file.txt in array arr with keys being their numbers, if pattern found I set ln variable to line of number. After all lines are stores I iterate over arr print lines whose numbers are not ln less 1 and ln less 2.

(tested in GNU Awk 5.1.0)

Daweo
  • 31,313
  • 3
  • 12
  • 25
0

This might work for you (GNU sed):

sed -En ':a
         N
         /(.*(pattern))\n?(.*\2)/{h;s//\1/p;x;s//\3/}
         ${s/([^\n]*\n){2}(pattern)/\2/;p}
         ba' file

Gather up the lines of the file.

If the collection contains two occurrences of the pattern, print upto and including the first pattern, then reduce the current collection by the same amount (minus an introduced leading newline).

At the end of the file, match on pattern again, this time removing the two lines before it and print the result.

Alternative:

sed -zE 's/(.*)(\n[^\n]*){2}(\npattern)/\1\3/' file
potong
  • 55,640
  • 6
  • 51
  • 83
0

tac + sed

tac infile | sed -n '
    p
    /pattern/ {
        n
        n
    :a
        n
        p
        ba
    }
' | tac >tmpfile &&
mv tmpfile infile

sed + shell

(
    n=$(sed -n '/pattern/=' infile | tail -n 1)
    sed -i "$((n-2)),$((n-1))d" infile
)
jhnc
  • 11,310
  • 1
  • 9
  • 26
0

Using any awk with tac and only reading 1 line at a time into memory:

$ tac file | awk '!(c && c--); !f && /pattern/{f=c=2}' | tac
aaa
bbb
ccc
pattern
eee
pattern
hhh

Most of the other posted solutions are reading the whole input into memory and so will fail if the input is too large to fit in memory.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    However `| tac` snarfs everything into a temporary file and then invokes its algorithm for reversing a file. I think the ideal approach here in terms of resources would be to make two passes over the file: one to identify where the unwanted lines occur and then another to filter them out by position. – Kaz Jun 10 '23 at 16:25
  • @Kaz ah, that makes sense, thanks. Still avoids the potential memory issue of course but you're right it'd introduce a disk space one. – Ed Morton Jun 10 '23 at 16:52
0

We can avoid using temporary files (e.g. via tac) or loading the file into RAM, or editing in-place, if we make two passes over it:

$ awk 'NR == FNR && /pattern/ { pos = NR }
       NR == FNR { next }
       FNR < pos - 2 || FNR >= pos' data data
aaa
bbb
ccc
pattern
eee
pattern
hhh

Here, I'm giving the data file twice as an argument on awk's command line. The condition NR == FNR is an idiom in awk which evaluates to true when we are processing the first file (thus, in our case, the first pass over the same file).

In the first pass, we record the line number of the last line which matches pattern, simply by recording the position of line which matches pattern into the same pos variable.

In the second pass through the data, we print all lines which are not one of the two lines before pos.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • When Bulter Lampson said that all problems in computer science can be solved by adding another layer of indirection, he totally forgot about the power of adding another pass. :) – Kaz Jun 10 '23 at 16:58
  • The issue with that is it wouldn't work if the input was coming from a pipe rather than a file BUT the OP did say "I have a file..." so it's probably OK for them and anyone can always create a temp file to work on if necessary (assuming enough disk space). – Ed Morton Jun 10 '23 at 16:59