Delete lines between last matching patterns

Question

First of all, I am aware of these nice questions. My question is a bit different: given the text format below coming from a file1:

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep
Pattern 1
REMOVE ME
AND ME
ME TOO PLEASE
Pattern 2

How can I remove only text between last Pattern 1 and Pattern 2 including patterns, so that file1 now contains:

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

I would prefer solution with sed, but any other solution (perl, bash, awk) would do just fine.

score 2 · Answer 1 · answered Apr 19 '18 at 22:04

perl -ne 'if    (/Pattern 1/) { print splice @buff; push @buff, $_ }
          elsif (/Pattern 2/) { @buff = () }
          elsif (@buff)       { push @buff, $_ }
          else                { print }
' -- file

When you see Pattern 1, start pushing lines into a @buffer, output any lines accumulated so far. When you see Pattern 2, clear the buffer. If the buffer has been started, push any other line to it, otherwise print it (text before the first Pattern 1 or after Pattern 2.

Note: The behaviour of Pattern 2 without previous Pattern 1 was not specified.

ghoti · Accepted Answer · 2018-04-19T23:20:20.960

2

I can't think of a way to do this simply and elegantly in sed alone. It might be possible to do this with sed using write-only code, but I'd need a really good reason to write something like that. :-)

You still might be able to use sed for this in conjunction with other tools:

$ tac test.txt | sed '/^Pattern 2$/,/^Pattern 1$/d' | tac
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

If your system doesn't have a tac on it, you can create one with:

$ alias tac="awk '{L[i++]=\$0} END {for(j=i-1;j>=0;)print L[j--]}'"

or in keeping with the theme:

$ alias tac='sed '\''1!G;h;$!d'\'

That said, I'd do this in awk, like so:

$ awk '/Pattern 1/{printf "%s",b;b=""} {b=b $0 ORS} /Pattern 2/{b=""} END{printf "%s",b}' text.txt
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

Or split out for easier reading/commenting:

awk '
  /Pattern 1/ {          # If we find the start pattern,
    printf "%s",b        # print the buffer (or nothing if it's empty)
    b=""                 # and empty the buffer.
  }
  {                      # Add the current line to a buffer, with the
    b=b $0 ORS           # correct output record separator.
  }
  /Pattern 2/ {          # If we find our close pattern,
    b=""                 # just empty the buffer.
  }
  END {                  # And at the end of the file,
    printf "%s",b        # print the buffer if we have one.
  }' test.txt

This is roughly the same as hek2mgl's solution, but orders things a little more reasonably and uses ORS. :-)

Note that both of these solutions behave correctly only if Pattern 2 exists only once within the file. If you have multiple blocks, i.e. with both start and end patterns included, you'll need to work a little harder for this. If this is the case, please provide more detail in your question.

edited Apr 19 '18 at 23:20

answered Apr 19 '18 at 22:54

ghoti

45,319
8
65
104

Doesn't `sed` fail if there are multiple ranges existed? – revo Apr 19 '18 at 23:00
@revo, not that I'm aware. If this doesn't work for you, I'd love to hear about it, along with the version of sed you're using (or what OS, etc). – ghoti Apr 19 '18 at 23:01
Duplicate input file content and try yourself. With a blank line at end. – revo Apr 19 '18 at 23:04
@ghoti Any idea how I can do that in file? – Mikhail Apr 19 '18 at 23:10
@revo, I'm not seeing what you're seeing. I'm using sed on FreeBSD, but GNU sed is behaving the same way. – ghoti Apr 19 '18 at 23:11
@Mikhail, you mean, in a shell script? Sure, easy! Add your script so far to your question, along with the errors you're getting, and we can help you debug. – ghoti Apr 19 '18 at 23:13
I'm talking about address ranges you applied. It's globally matched. So if OP has multiple same blocks in file and wants last block to be removed doing `tac` followed by `sed` simply removes every block that matches the range. I'm not talking about sed implementations. `tac`ing current file shows one block starting with `Pattern 2` through `Pattern 1` that's why this single block is removed and your approach outputs right content. – revo Apr 19 '18 at 23:13
@Ghoti, sorry for being unclear, I mean given the input text is in `file1`, how can I use your `tac | sed | tac` solution to modify `file1` directly? I tried adding `>file1` to the end of pipe but apparently I get just an empty file afterwards. – Mikhail Apr 19 '18 at 23:15
@revo, Ah, yes, I see what you mean. I've flagged that limitation at the bottom of the answer, but I'll otherwise leave this for the moment, until we get a better idea in the question of what the OP is actually after. – ghoti Apr 19 '18 at 23:18
@Mikhail, ah, you can't. You'll need to script what `sed` does already when you use the `-i` option, redirect your output to a temporary file, then move the temporary file over your original (perhaps moving the original out of the way as a backup first). It would be awesome if you could add the details of what you really need to do, along with your attempt and results, to your question. – ghoti Apr 19 '18 at 23:19
@ghoti, I see, I got the general idea about temp files. I updated the question to the best of my ability; as for what I attempted - well, I don't know sed and friends well enough to try something apart from what's in other similar questions, hence this question – Mikhail Apr 19 '18 at 23:25

hek2mgl · Answer 3 · 2018-04-19T22:24:17.453

1

With awk:

awk '
# On pattern 1 and when the buffer is not empty, flush the buffer
/Pattern 1/ && b!="" { printf "%s", b; b="" }

# Append the current line and a newline to the buffer
{ b=b""$0"\n" }

# Clean the buffer on pattern 2
/Pattern 2/ { b="" }' file

edited Apr 19 '18 at 22:24

answered Apr 19 '18 at 22:15

hek2mgl

152,036
28
249
266

potong · Answer 4 · 2018-04-20T08:44:29.210

This might work for you (GNU sed):

sed '/Pattern 1/,${//{x;//p;x;h};//!H;$!d;x;s/.*Pattern 2[^\n]*\n\?//;/^$/d}' file

The general idea here is to gather up lines beginning with Pattern 1 and then either flush those lines when another line beginning with Pattern 1 is encountered or at end-of-file remove the lines between Pattern 1 and Pattern 2 and print what is left over.

Focus on the lines between the first line containing Pattern 1 and the end-of-file, print all other lines as normal. If a line contains Pattern 1, swap to the hold space and if those lines also contain the same regexp, print those lines and then replace the current line in the hold space. If the current line does not contain the regexp, then append it to the hold space and if it is not the end-of-file delete it. At the end-of-file, swap to the hold space and remove any lines upto and including the line containing Pattern 2 and print what is remaining.

N.B. a tricky situation arises as in your example, when the line containing Pattern 2 is the last line of the file. As sed uses newline to delimit lines, it removes them before placing the line into the pattern space and appends them prior to printing. If the pattern/hold space is empty, sed will append a newline, which in this case would add a spurious newline. The solution is to remove any lines between Pattern 1 and Pattern 2 including any newline following the line containing Pattern 2. If there are additional lines these will be printed as normal, however if there were no lines following, the hold space will now be empty and as it must have contained something before, since it is now empty it can safely be deleted.

Delete lines between last matching patterns

4 Answers4