print lines from one match to another unless third match is between

Question

is there a way in bash to print lines from one match to another, unless third match is between those lines? Let's say file is:

A
B
C
D
E
A
B
Z
C
D

And I want to print all the lines between "A" and "C", but not those containing "Z", so output should be:

A
B
C

I'm using this part of code to match lines between "A" and "C":

awk '/C/{p=0} /A/{p=1} p'

RavinderSingh13 · Answer 1 · 2021-12-01T11:15:02.883

With your shown samples, please try following awk code.

awk '
/A/         { found=1 }
/C/ && !noVal && found{
     print value ORS $0
     found=noVal=value=""
}
found && /Z/{ noVal=1 }
found{
     value=(value?value ORS:"")$0
}
' Input_file

Explanation: Adding detailed explanation for above.

awk '                                ##Starting awk program from here.
/A/         { found=1 }              ##Checking condition if line has A then set found to 1.
/C/ && !noVal && found{              ##Checking if C is found and noVal is NULL and found is set then do following.
     print value ORS $0              ##printing value ORS and current line here.
     found=noVal=value=""            ##Nullifying found,noVal and value here.
}
found && /Z/{ noVal=1 }              ##Checking if found is SET and Z is found then set noVal here.
found{                               ##Checking if found is set here.
     value=(value?value ORS:"")$0    ##Creating value which has current line in it and keep concatenating values to it.
}
' Input_file                         ##Mentioning Input_file name here.

Pierre François · Answer 2 · 2021-12-01T16:12:21.730

1

I would use A as record separator and C as field separator. So, the A to C range would be in $1 (except the A itself at the beginning and the C at the end) and the rest up to the next A in $2.

The trick is to print only the first field $1 if it doesn't contain any Z. Skip the first record that will be empty.

So try:

awk 'BEGIN{RS="A";FS="C"}(NR > 1) && !/Z/{print "A" $1 "C"}' inputfile

Or even better, according to a comment of Ed Morton below:

awk 'BEGIN{RS="A";FS="C"}(NR > 1) && !/Z/{print RS $1 FS}' inputfile

If the Z can occur after C, we will have to correct the code.

edited Dec 01 '21 at 16:12

answered Dec 01 '21 at 11:33

Pierre François

5,850
1
17
38

Thanks Pierre, I've used something hybrid to Your solution and I think it works :) – Piotrek Cyran Dec 01 '21 at 13:48

Ed Morton · Answer 3 · 2021-12-01T12:19:55.810

This uses full-line string matching instead of the partial-line regexp matching used in your question and the other answers posted so far (see how-do-i-find-the-text-that-matches-a-pattern for the difference) as I expect it's what you should really be using for a robust solution:

$ cat tst.awk
$0 == "A" { inBlock=1 }
inBlock {
    rec = rec $0 ORS
    if ( $0 == "C" ) {
        if ( !index(ORS rec ORS, ORS "Z" ORS) ) {
            printf "%s", rec
        }
        rec = ""
        inBlock = 0
    }
}

$ awk -f tst.awk file
A
B
C

If you REALLY wanted to continue to use partial-line regexp matching that'd be this:

$ cat tst.awk
/A/ { inBlock=1 }
inBlock {
    rec = rec $0 ORS
    if ( /C/ ) {
        if ( rec !~ /Z/ ) {
            printf "%s", rec
        }
        rec = ""
        inBlock = 0
    }
}

but that's fragile if your real data isn't just single letters on their own lines.

score 0 · Answer 4 · answered Dec 01 '21 at 12:54

You can use sed to do what you described:

sed '/^A$/,/^C$/!d; /^Z$/d' example-data

# gives
A
B
C
A
B
C

!d means delete lines which don't match the address.

Your expected result was three lines though. So you could use sed '/^A$/,/^C$/!d; /^Z$/d; /^C$/q'. Or, sed '/^A$/,/^C$/!d; /^Z$/d' | sort -u?

print lines from one match to another unless third match is between

4 Answers4