SED, deleting lines between the patterns

Question

This is regarding deleting the lines between the pattern excluding the lines with pattern using sed.

If the second pattern appears twice or more often, I want the lines to be deleted until the last occurrence of the second pattern.

How would I do that?

Show sample input and your desired output for that sample input. — Cyrus, Jun 27 '15 at 19:07
it would be nice if you give a link in your question with an example on https://regex101.com/ — Nassim, Jun 27 '15 at 19:28
better not to try to hide all details from us. you may not get your answer, while wasting our time. — Jason Hu, Jun 28 '15 at 00:08
Possible duplicate of [Using sed to delete all lines between two matching patterns](https://stackoverflow.com/q/6287755/608639), [SED delete lines between two pattern matches](https://stackoverflow.com/q/8085633/608639), [sed delete lines between two patterns, without the second pattern, including the first pattern](https://stackoverflow.com/q/42898905/608639), [SED delete specific lines between two patterns?](https://stackoverflow.com/q/19233578/608639) and friends. — jww, Dec 23 '19 at 17:44

score 2 · Answer 1 · answered Jun 27 '15 at 22:28

The main thing to realize is that sed operates on individual lines, not on the whole file at once, which means that without special treatment it cannot obtain multi-line matches from a regex. In order to operate on the whole file at once, you first have to read the whole file into memory. There are many ways to do this; one of them is

sed '1h; 1!H; $!d; x; s/regex/replacement/' filename

This works as follows:

1h   # When processing the first line, copy it to the hold buffer.
1!H  # When processing a line that's not the first, append it to the hold buffer.
$!d  # When processing a line that's not the last, stop working here.
x    # If we get here, we just appended the last line to the hold buffer, so
     # swap hold buffer and pattern space. Now the whole file is in the pattern
     # space, where we can apply regexes to it.

I like to use this one because it doesn't involve jump labels. Some seds (notably BSD sed, as comes with *BSD and MacOS X) are a bit prissy when those are involved.

So, all that's left is to formulate a multi-line regex. Since you didn't specify the delimiter patterns, let me assume that you want to remove lines between the first line that contains START and the last line that contains END. This could be done with

sed '1h; 1!H; $!d; x; s/\(START[^\n]*\).*\(\n[^\n]*END\)/\1\2/' filename

The regex does not contain anything spectacular; mainly you have to be careful to use [^\n] in the right places to avoid greedily matching beyond the end of a line.

Note that this will only work as long as the file is small enough to be read completely into memory. If this is not the case, my suggestion is to make two passes over the file with awk:

awk 'NR == FNR && /START/ && !start { start = NR } NR == FNR && /END/ { end = NR } NR != FNR && (FNR <= start || FNR >= end)' filename filename

This works as follows: since filename is passed to awk twice, awk will process the file twice. NR is the overall record (line, by default) count, FNR the number of records read so far from the current file. In the first pass over the file, NR and FNR are equal, after that they're not. So:

# If this is the first pass over the file, the line matches the start pattern,
# and the start marker hasn't been set yet, set the start marker
NR == FNR && /START/ && !start { start = NR }

# If this is the first pass over the file and the line matches the end line,
# set the end marker to the current line (this means that the end marker will
# always identify the last occurrence of the end pattern that was seen so far)
NR == FNR && /END/             { end   = NR }

# In the second pass, print those lines whose number is less than or equal to
# the start marker or greater than or equal to the end marker.
NR != FNR && (FNR <= start || FNR >= end)

score 1 · Answer 2 · edited May 23 '17 at 12:14

1

To follow up on Wintermute's answer, if you've found a block that does match, you can delete it along the way, so you don't have to keep the entire file in memory:

sed '/^START$/{:a;N;/.*\nEND$/d;ba}'

(sorry, would have replied to Wintermute's answer, but apparently I still need 50 reputation points for that privilege)

edited May 23 '17 at 12:14

Community

1
1

answered Jun 28 '15 at 01:34

Gumnos

403
3
7

score 0 · Answer 3 · answered Jun 27 '15 at 23:38

No example input, so guessing an example file and patterns /line3/ and /line6/.

line1 #keep - up to 1st pattern line3 - including
line2 #keep
line3 #keep
line4 #delete up to last occurence of line6
line5
line6a
line7
line6b
line8 #delete
line6c #keep - the last line6
line9  #keep
line10 #keep

without any dark voo-doo, but inefficient method could be:

(sed -n '1,/line3/p' file; tail -r file | sed -n '1,/line6/p' | tail -r) > file2

the file2 will contain:

line1
line2
line3
line6c
line9
line10

explanation:

sed -n '1,/line3/p' file; # prints line 1 up to pattern (included)

tail -r file | sed -n '1,/line6/p' | tail -r
#reverse the file
#print the lines up to pattern2
#reverse the result

SED, deleting lines between the patterns

3 Answers3