3

I've been trying to use sed to accomplish the following. Let's say I have the following file (note: my actual file is more complicated than this):

hello world
foo bar
people people
target
something
done

I want to check if target exists between two patterns, in this example, between lines foo bar and done (both lines inclusive), and delete the whole pattern if the target does exist.

I know how to delete the lines between the two patterns using this sed command:

sed '/people.*/,/done/d' file

But I want only to delete it if the string target exists in between the two string matches.

My logic has been something like this:

sed -n '/people.*/,/done/p' file | check if target string exists | delete entire pattern found by sed

EDIT

I forgot to mention that there can be any number of words before target and after target on the same line.

buydadip
  • 8,890
  • 22
  • 79
  • 154

4 Answers4

4

Sed

This will remove from $start to $end if it finds $pattern in it:

sed ":a;N;\$!ba; s/$start.*$pattern.*$end//g"

There are two steps (statements) here:

  1. Read the entire file as a single string (can be bad depending on file size). For a very good explanation, refer https://stackoverflow.com/a/1252191. The only difference is the additional backtick before the $!ba, to make it work with double-quotes, which is useful for passing Bash variables inside the sed line.
  2. The regular old search/replace.

Perl

To handle ungreedy matches, if Perl is allowed, use:

perl -0777 -p -e 's/$start.*?$pattern.*?$end//s'

This will also read the entire file as a string. The /s at the end tells it to include newlines as part of the regex match. Use .* instead of .*? to go back to greedy search.

Community
  • 1
  • 1
bsravanin
  • 1,803
  • 1
  • 14
  • 15
  • works perfectly...I know what `N` does, but could you add a little explanation as to `:a` and `$!ba` – buydadip Jan 02 '15 at 19:36
  • Will this not match greedily if there's another `done` somewhere later in the file? – Wintermute Jan 02 '15 at 19:42
  • True. sed cannot handle ungreedy matches. For that one will have to move to PCRE regex engines. Maybe Perl or Python? – bsravanin Jan 02 '15 at 19:48
  • Your answer works great for the sample I provided above, but unfortunately it matches greedily on some other files I've tried :( – buydadip Jan 02 '15 at 19:54
2

sed is an excellent tool for simple substitutions on a single line but all of it's constructs for handling multiple lines became obsolete in the mid-1970s when awk was invented so just use awk for simplicity, clarity, robustness, etc. e.g. with GNU awk for multi-char RS:

$ awk -v RS='^$' '{sub(/\nfoo bar\n.*target.*\ndone\n/,""); print}' file
hello world
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

A way to do this without reading the entire file into memory first and inviting greedy-match issues if the file contains done several times is

sed '/^people/ { :loop; N; /\ndone/ ! b loop; /target/ d }' filename

On Mac OS X it is apparently necessary to have a newline before the closing bracket, so there you can either put the code into a multiline string literal:

sed '/^people/ { :loop; N; /\ndone/ ! b loop; /target/ d 
}' filename

Or put this (in any case more readable) version of the code in a file, say foo.sed, and use sed -f foo.sed filename:

/^people/ {
  :loop
  N
  /\ndone/ ! b loop
  /target/ d
}

The code works as follows:

/^people/ {

In a line that starts with "people"

  :loop
  N
  /\ndone/ ! b loop

fetch more lines in a loop until one starts with done (this will be the first time \ndone appears in the pattern space)

  /target/ d

If there's target somewhere in all that, discard the whole thing

}

otherwise proceed as usual (which means printing the pattern space because we didn't pass -n to sed).

One possible improvement for robustness is

sed '/^people/ { :loop; N; /\ndone$/! { $! b loop }; /target/ d }' filename

or

/^people/ {
  :loop
  N
  /\ndone/ ! {
    $ ! b loop
  }
  /target/ d
}

with the change /\ndone$/! { $! b loop }. This will end the loop on the last line of the file even if no done is encountered, which has the effect that unfinished people sections at the end of a file are not discarded (unless they contain target).

Wintermute
  • 42,983
  • 5
  • 77
  • 80
  • This answer looks great, but when I try to run it, I get this error : `sed: 1: "/^people/ { :loop; N; ...": unexpected EOF (pending }'s)`...Any reason why? I'm using OSx by the way. – buydadip Jan 02 '15 at 20:05
  • Yeah, I misread the error (hence the ninja delete). Hang on. I have no means of testing OSX, but maybe there's a documentation of differences somewhere. – Wintermute Jan 02 '15 at 20:11
  • Does it work if you add a `;` after the `d` at the end? – Wintermute Jan 02 '15 at 20:12
  • nope, still get the same error, I'm thinking it might have something to do with the `{` brackets, I've run into problems with them before while using sed. – buydadip Jan 02 '15 at 20:13
  • Okay, I tried it in my FreeBSD VM (which apparently has the same sed as OSX). It works if you put the readable version of the code into a file and use `sed -f foo.sed filename`, or if you stretch the string literal with the code over several lines (with a newline before the `}`). – Wintermute Jan 02 '15 at 20:21
  • I put the readable version in a sed file, but now I get a different error, in line 4 of the readable version: sed: 4: `do.sed: unterminated regular expression` – buydadip Jan 02 '15 at 20:38
  • Not `sed do.sed filename`, `sed -f do.sed filename`. – Wintermute Jan 02 '15 at 23:26
1

Late answer

sed '/^foo bar *$/,/^done *$/{/^done *$/!{H;d};/^done *$/{H;g;s/.*//g;x;/.*target.*/d;s/^.//g}}'

find all lines between /^foo bar *$/,/^done *$/

/foo bar/,/done/

This /^done *$/!{H;d} take all lines from foo bar but not the last line "done" and put it in the hold space. afterwards deletes these lines from the pattern space.

This /^done *$/{H;g;s/.*//g;x; take the last line "done" and append it to the hold space. Now we have the all lines from the line foo bar to the line done in the hold space. afterwards we clear everything that is in the patter space and swap the range of lines that is in the hold space with the empty line that is in the pattern space (this is to always keep the hold space empty when targetting another range of lines between "foo bar" and "done".

finally

/.*target.*/d 

we test to see if the "target" is in the mutli-pattern space. if it is, the range of lines between "foo bar" and "done" will be deleted

This avoid reading the whole file as a single string

Example

hello world
foo bar
people people
target
something
done
foo bar
.....
.....
.....
done
foo bar
people people
test
something
done

results

hello world
foo bar
.....
.....
.....
done
foo bar
people people
test
something
done

Note: range of lines starting from "foo bar" to the line "done" with the line that contains "target" is being deleted

repzero
  • 8,254
  • 2
  • 18
  • 40