Extract lines between two patterns and remove in between lines with if condition

Question

I have a file with the following content. I am trying to extract the block with matching start and end patterns, in between I want to exclude the block which has a non-matching numeric id ( maybe a pattern ). Here other than [001] has to be excluded. 002 may not be known. So, I want the blocks only matching with [001].

File contains,

    text [001] start
    line 1
    line 2
    text [002] mid start
    line 3     
    line 4
    text [002] mid end
    line 5
    line 6
    text [001] end

I need the block, with excluding nonmatching numeric id [002]'s block.

    text [001] start
    line 1
    line 2
    line 5
    line 6
    text [001] end

I couldn't get a clear clarification on the internet for this problem. Can anyone help with this, awk or sed solution?

To get the block with start and end pattern, I am trying with

   awk '/[001]/ && /start/, /001/ && /end/' File

choroba · Accepted Answer · 2019-06-25T13:47:16.627

1

Use sed or Perl:

sed '/001.*start/,/001.*end/!d;/002.*start/,/002.*end/d'

perl -ne 'print if /001.*start/ .. /001.*end/
                and not /002.*start/ .. /002.*end/'

Using look-ahead assertions can make the excluded tag dynamic easily:

perl -ne 'print if /001.*start/ .. /001.*end/
                and not /text \[(?!001).*start/ .. /text \[(?!001).*end/'

edited Jun 25 '19 at 13:47

answered Jun 25 '19 at 11:26

choroba

231,213
25
204
289

1

How to you find the 001 block, you assume just to remove 002 block. – Jotne Jun 25 '19 at 11:43
Aha! Non-representative sample input. – choroba Jun 25 '19 at 11:55
It written in OPs post: `I am trying to extract the block with matching start and end patterns`. That would be block 001 – Jotne Jun 25 '19 at 11:57
Updated. Maybe I understood "extract" too broadly. – choroba Jun 25 '19 at 11:59
Can you suggest a solution that works with any other inner blocks?. Here 002 may vary. So, I need to remove the blocks other than the 001. Means, taking 002 as the pattern may not fit. @choroba – Megkcalb Jun 25 '19 at 12:51
@Megkcalb: Check the update, I used negative look-aheads to check the number of the block is different. – choroba Jun 25 '19 at 13:47

Jotne · Answer 2 · 2019-06-25T11:50:55.103

1

This awk may do. You may need to tweak trigger to work for your data:

awk '/\[001\] start/{f=1} /\[002\] .* start/{f=0} f;  /\[001\] end/{f=0}  /\[002\] .* end/{f=1}' file
    text [001] start
    line 1
    line 2
    line 5
    line 6
    text [001] end

More readable

awk '
    /\[001\].*start/ {f=1}
    /\[002\].*start/ {f=0} 
    f;  
    /\[001\].*end/ {f=0}
    /\[002\].*end/ {f=1}
    ' file

Just change trigger code to reflect true data.

edited Jun 25 '19 at 11:50

answered Jun 25 '19 at 11:43

Jotne

40,548
12
51
55

What if the roles of 001 and 002 are inverted? – kvantour Jun 25 '19 at 11:48
@kvantour I guess this is just an example code. He need a block of data, and within that block there are some thing to remove. – Jotne Jun 25 '19 at 11:51
I think any number but `001` can appear inside and those "sub-blocks" should be removed. – Wiktor Stribiżew Jun 25 '19 at 11:54
Instead of sub-block, it may be referred as blocks with non-matching numeric id located within one certain block. Here 001 is a needed block and 002 is non-matching block. @WiktorStribiżew – Megkcalb Jun 25 '19 at 12:00
@Megkcalb so no one of the answer helps you. Not giving +1 or accept one? – Jotne Jun 25 '19 at 16:21

score 1 · Answer 3 · answered Jun 25 '19 at 11:47

1

Assume we make use of the variables b1 if we are in block 1 and b2 if we are in block 2:

awk '/001/ && /start/ { b1=1 }
     /002/ && /start/ { b2=1 }
     (b1 && !b2)
     /002/ && /end/   { b2=0 }
     /001/ && /end/   { b1=0 }' file

Range expressions are handy, but to quote Ed Morton: Never use range expressions (e.g. /start/,/end/) as they make trivial tasks very slightly briefer but then require duplicate conditions or a complete rewrite for the tiniest requirements change.

answered Jun 25 '19 at 11:47

kvantour

25,269
4
47
72

What if block 002 ends after block 001. Both your and min will fail – Jotne Jun 25 '19 at 11:55
Overlapping blocks ... fascinating. This will print still everything from block1 that does not belong to block 2. – kvantour Jun 25 '19 at 11:58

score 1 · Answer 4 · answered Jun 25 '19 at 16:21

Assuming your blocks are nested to any depth and just never overlapping:

$ cat tst.awk
BEGIN { tgtId="001" }

match($0,/\[[0-9]+\]/) {
    id = substr($0,RSTART+1,RLENGTH-2)
    state = $NF
}

state == "start"  { isTgtBlock[++depth] = (id == tgtId ? 1 : 0) }

isTgtBlock[depth] { print }

state == "end"    { --depth }

{ id = state = "" }

$ awk -f tst.awk file
    text [001] start
    line 1
    line 2
    line 5
    line 6
    text [001] end

score 0 · Answer 5 · answered Jun 25 '19 at 21:25

0

This might work for you (GNU sed):

sed -n '/\[001\]/,/\[001\]/{/\[002\]/,/\[002\]/!p}' file

Print only lines between [001] delimiters and exclude those lines between [002] delimiters.

answered Jun 25 '19 at 21:25

potong

55,640
6
51
83

Extract lines between two patterns and remove in between lines with if condition

5 Answers5