The main thing to realize is that sed
operates on individual lines, not on the whole file at once, which means that without special treatment it cannot obtain multi-line matches from a regex. In order to operate on the whole file at once, you first have to read the whole file into memory. There are many ways to do this; one of them is
sed '1h; 1!H; $!d; x; s/regex/replacement/' filename
This works as follows:
1h # When processing the first line, copy it to the hold buffer.
1!H # When processing a line that's not the first, append it to the hold buffer.
$!d # When processing a line that's not the last, stop working here.
x # If we get here, we just appended the last line to the hold buffer, so
# swap hold buffer and pattern space. Now the whole file is in the pattern
# space, where we can apply regexes to it.
I like to use this one because it doesn't involve jump labels. Some seds (notably BSD sed, as comes with *BSD and MacOS X) are a bit prissy when those are involved.
So, all that's left is to formulate a multi-line regex. Since you didn't specify the delimiter patterns, let me assume that you want to remove lines between the first line that contains START
and the last line that contains END
. This could be done with
sed '1h; 1!H; $!d; x; s/\(START[^\n]*\).*\(\n[^\n]*END\)/\1\2/' filename
The regex does not contain anything spectacular; mainly you have to be careful to use [^\n]
in the right places to avoid greedily matching beyond the end of a line.
Note that this will only work as long as the file is small enough to be read completely into memory. If this is not the case, my suggestion is to make two passes over the file with awk:
awk 'NR == FNR && /START/ && !start { start = NR } NR == FNR && /END/ { end = NR } NR != FNR && (FNR <= start || FNR >= end)' filename filename
This works as follows: since filename
is passed to awk
twice, awk
will process the file twice. NR
is the overall record (line, by default) count, FNR
the number of records read so far from the current file. In the first pass over the file, NR
and FNR
are equal, after that they're not. So:
# If this is the first pass over the file, the line matches the start pattern,
# and the start marker hasn't been set yet, set the start marker
NR == FNR && /START/ && !start { start = NR }
# If this is the first pass over the file and the line matches the end line,
# set the end marker to the current line (this means that the end marker will
# always identify the last occurrence of the end pattern that was seen so far)
NR == FNR && /END/ { end = NR }
# In the second pass, print those lines whose number is less than or equal to
# the start marker or greater than or equal to the end marker.
NR != FNR && (FNR <= start || FNR >= end)