0

I'm trying to remove the lines between two patterns including the lines with the patterns themselves, if another pattern is found between them, but I'm not sure how to tackle it.

Say I have an input like the following and want to delete lines #6 to #11 because the pattern notthis is found between the patterns start and end:

start
AHBUe3Ar5NoD
3EcuCcD2QCja
7VmlKFbD8Rbi
end
start
OgytsRhZbD8T
notthis
0PlcUh2RLvVW
tsz2S80SyW9p
end
start
dQ5qiZCvBqcK
SufdS40X1Sh2
B1cyNshOj2Z4
end

I changed what I thought I understood from this answer to something like this, but it doesn't work:

/^start$/{$!{N;/^start\n(.*\n)*notthis.*\n(.*\n)*end/d;ty;P;D;:y}}

Is it because N only appends the line following the initial pattern ^start$ to the pattern space and ignores what follows? And what would be the correct way to achieve what I am trying to?

Dudi Boy
  • 4,551
  • 1
  • 15
  • 30
David
  • 308
  • 1
  • 12
  • 1
    Hopefully there comes a point when writing a script packed with runes like `/^start$/{$!{N;/^start\n(.*\n)*notthis.*\n(.*\n)*end/d;ty;P;D;:y}}` when you think to yourself - "wtf am I doing???"! – Ed Morton Jul 09 '19 at 16:24

3 Answers3

2

sed is for simple substitutions on individual strings, that is all. For anything else you should be using awk, e.g. with GNU awk for mult-char RS this brief script will produce the output you want from the input you posted:

$ awk 'BEGIN{RS=ORS="end\n"} !/notthis/' file
start
AHBUe3Ar5NoD
3EcuCcD2QCja
7VmlKFbD8Rbi
end
start
dQ5qiZCvBqcK
SufdS40X1Sh2
B1cyNshOj2Z4
end

or clearer, more robustly, and easier to enhance with any awk:

$ cat tst.awk
/start/ { f = 1 }
f {
    rec = rec $0 ORS
    if ( /end/ ) {
        if ( rec !~ /notthis/ ) {
            printf "%s", rec
        }
        rec = ""
        f = 0
    }
}
$
$ awk -f tst.awk file
start
AHBUe3Ar5NoD
3EcuCcD2QCja
7VmlKFbD8Rbi
end
start
dQ5qiZCvBqcK
SufdS40X1Sh2
B1cyNshOj2Z4
end

The above will work efficiently and robustly using any awk in any shell on every UNIX box, is easy to understand and trivial to modify if/when your requirements change.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Thank you, works like a charm. Would you mind explaining both? – David Jul 09 '19 at 16:36
  • The first one reads one whole record (multi-line block of text) at a time where each record ends with `end\n` and prints the record if it doesn't contain `notthis`. The second one sets a flag when `start` is found, builds up the record one line at a time while that flag is set, and then when `end` is found prints the record if it doesn't contain `notthis`. Let me know if you have any specific questions about the syntax after a glance at the awk man page. – Ed Morton Jul 09 '19 at 16:42
0

Here is another awk script. Hope match the partial problem description.

script.awk

BEGIN {omitMark = "notthis"}  # assign omit marker as ReqExp
/start/, /end/ {   # define RegExp range for omission section 
    if ($0 ~ omitMark) next;  # if matched omission marker skip processing
    print;  # print not ommited currnt line in section
    next;   # skip to process next line in section
}
1;  # print any line not in section.

input.txt

start
AHBUe3Ar5NoD
3EcuCcD2QCja
7VmlKFbD8Rbi
end
start
OgytsRhZbD8T
notthis
0PlcUh2RLvVW
tsz2S80SyW9p
end
notthis
start
dQ5qiZCvBqcK
SufdS40X1Sh2
B1cyNshOj2Z4
notthis
end
notthis

running:

awk -f script.awk input.txt

output:

start
AHBUe3Ar5NoD
3EcuCcD2QCja
7VmlKFbD8Rbi
end
start
OgytsRhZbD8T
0PlcUh2RLvVW
tsz2S80SyW9p
end
notthis
start
dQ5qiZCvBqcK
SufdS40X1Sh2
B1cyNshOj2Z4
end
notthis
Dudi Boy
  • 4,551
  • 1
  • 15
  • 30
  • Make sure to understand the discussion at https://stackoverflow.com/q/23934486/1745001 if you're ever considering using a range expression. – Ed Morton Jul 09 '19 at 17:30
0

This might work for you (GNU sed):

sed '/^start/{:a;N;/end$/!ba;/notthis/d}' file

Gather up the lines between start and end and if they contain the string notthis delete them.

potong
  • 55,640
  • 6
  • 51
  • 83