How to delete lines between two patterns(deleting inclusive of patterns) when pattern repeats

Question

I have a file that contains duplicate patterns. I want to delete all the lines between these patterns only when there are duplicate patterns.

For example, if the input file is:

Pattern1=File1
cat
dog
PatternEnd1
blah
blah
Pattern1=File1
fish
dog
Pattern1End
blah
blah
Pattern1=File1
tiger
dog
Pattern1End

The output should be:

Pattern1=File1
cat
dog
PatternEnd1
blah
blah
blah
blah

I tried using sed and doing sed '/Pattern1=File1/,/PatternEnd1/d' but it is deleting everything whenever the pattern matches. I want to delete everything between duplicate patterns while preserving the first occurrence.

I want to do this inside a Perl script.

Why do you tag `sed` and `awk` while you like `perl` – Jotne Aug 12 '21 at 12:25 — Jotne, Aug 12 '21 at 12:25

score 2 · Answer 1 · answered Aug 12 '21 at 11:52

2

There are a couple of ways to do it. I would use the hold space:

sed -n '/Pattern1=File1/{x;/^$/!p;d;};/Pattern1End/{n;h;d;};H'

If you encounter Pattern1=File1, print whatever's in the hold space (if anything) and move on. If you encounter Pattern1End, grab the next line and store it in the hold space, overwriting what was there. Otherwise, collect whatever you read in the hold space.

answered Aug 12 '21 at 11:52

Beta

96,650
16
149
150

hi, apologies, I misspelt the End Patterns. Pattern1=File1..PatternEnd1 are repetitive patterns. Only the first occurrence of it should be preserved and repeating ones to be deleted. Thanks. – LovelyGeek Aug 12 '21 at 12:26

score 1 · Answer 2 · answered Aug 12 '21 at 13:45

1

In Perl you could use the flip-flop operator. For example:

perl -lne 'if (/^Pattern1=File1$/ .. /^Pattern1End$/) { 
              print if !$flag } else {$flag=1; print}' file

answered Aug 12 '21 at 13:45

Håkon Hægland

39,012
21
81
174

thanks for the inputs. However, this code is removing all the duplicates, I want to preserve one of them and deleting all the repetitive ones Thanks – LovelyGeek Aug 12 '21 at 14:18
1

I tested it with the input you provided (after changing `PatternEnd1` to `Pattern1End` as I assumed it was a typo (?) ) and it did not remove the first one.. What input file did you use? – Håkon Hægland Aug 12 '21 at 14:30
I have also tested the sample file I have provided and it is working fine, but not sure why it was not working with original file I have. The solution provided with awk utility below is working for me. Many thanks for the help, really appreciate. – LovelyGeek Aug 12 '21 at 20:08

score 1 · Answer 3 · answered Aug 12 '21 at 15:30

1

awk '/^Pattern1=File1$/ {f=f2;f1=1} !f; /^Pattern1End$/ {f2=f1;f=0}' file

This method means that f can't be set until the start and end patterns have been found in order. (Are the "patterns" meant to be regular expressions? Consider How do I find the text that matches a pattern?)

answered Aug 12 '21 at 15:30

Thank You @rowboat the awk utility did the magic for me. However when i was running in the loop it is not giving me the right results(lots of duplicates) only when i put the unique filename it is working. As a workaround I'm redirecting the output to temporary files for every new comparison coming from the loop. Is there any way i can run it as awk loop to get rid of redirecting to temporary files. Thank You in Adavance. – LovelyGeek Aug 12 '21 at 20:13
Thanks I have found the way to run it in a loop, however if you could explain the logic would be really appreciable. I'm completely new to awk. – LovelyGeek Aug 15 '21 at 13:50

score 0 · Answer 4 · answered Aug 13 '21 at 10:10

This might work for you (GNU sed):

sed '/Pattern1=File1/{:a;N;/Pattern1End/!ba;x;/./{x;d};x;h}' file

Gather up lines between Pattern1=File1 and Pattern1End.

Check the hold space to see if a flag has been set and if so delete the collection.

Otherwise, set the flag and print the collection.

Alternative:

sed '/Pattern1=File1/,/Pattern1End/{x;/Pattern1End/{x;d};x;h}' file

How to delete lines between two patterns(deleting inclusive of patterns) when pattern repeats

4 Answers4