-1

I have a file that contains duplicate patterns. I want to delete all the lines between these patterns only when there are duplicate patterns.

For example, if the input file is:

Pattern1=File1
cat
dog
PatternEnd1
blah
blah
Pattern1=File1
fish
dog
Pattern1End
blah
blah
Pattern1=File1
tiger
dog
Pattern1End

The output should be:

Pattern1=File1
cat
dog
PatternEnd1
blah
blah
blah
blah

I tried using sed and doing sed '/Pattern1=File1/,/PatternEnd1/d' but it is deleting everything whenever the pattern matches. I want to delete everything between duplicate patterns while preserving the first occurrence.

I want to do this inside a Perl script.

Dada
  • 6,313
  • 7
  • 24
  • 43
LovelyGeek
  • 39
  • 3

4 Answers4

2

There are a couple of ways to do it. I would use the hold space:

sed -n '/Pattern1=File1/{x;/^$/!p;d;};/Pattern1End/{n;h;d;};H'

If you encounter Pattern1=File1, print whatever's in the hold space (if anything) and move on. If you encounter Pattern1End, grab the next line and store it in the hold space, overwriting what was there. Otherwise, collect whatever you read in the hold space.

Beta
  • 96,650
  • 16
  • 149
  • 150
  • hi, apologies, I misspelt the End Patterns. Pattern1=File1..PatternEnd1 are repetitive patterns. Only the first occurrence of it should be preserved and repeating ones to be deleted. Thanks. – LovelyGeek Aug 12 '21 at 12:26
1

In Perl you could use the flip-flop operator. For example:

perl -lne 'if (/^Pattern1=File1$/ .. /^Pattern1End$/) { 
              print if !$flag } else {$flag=1; print}' file
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • thanks for the inputs. However, this code is removing all the duplicates, I want to preserve one of them and deleting all the repetitive ones Thanks – LovelyGeek Aug 12 '21 at 14:18
  • 1
    I tested it with the input you provided (after changing `PatternEnd1` to `Pattern1End` as I assumed it was a typo (?) ) and it did not remove the first one.. What input file did you use? – Håkon Hægland Aug 12 '21 at 14:30
  • I have also tested the sample file I have provided and it is working fine, but not sure why it was not working with original file I have. The solution provided with awk utility below is working for me. Many thanks for the help, really appreciate. – LovelyGeek Aug 12 '21 at 20:08
1
awk '/^Pattern1=File1$/ {f=f2;f1=1} !f; /^Pattern1End$/ {f2=f1;f=0}' file

This method means that f can't be set until the start and end patterns have been found in order. (Are the "patterns" meant to be regular expressions? Consider How do I find the text that matches a pattern?)

  • Thank You @rowboat the awk utility did the magic for me. However when i was running in the loop it is not giving me the right results(lots of duplicates) only when i put the unique filename it is working. As a workaround I'm redirecting the output to temporary files for every new comparison coming from the loop. Is there any way i can run it as awk loop to get rid of redirecting to temporary files. Thank You in Adavance. – LovelyGeek Aug 12 '21 at 20:13
  • Thanks I have found the way to run it in a loop, however if you could explain the logic would be really appreciable. I'm completely new to awk. – LovelyGeek Aug 15 '21 at 13:50
0

This might work for you (GNU sed):

sed '/Pattern1=File1/{:a;N;/Pattern1End/!ba;x;/./{x;d};x;h}' file

Gather up lines between Pattern1=File1 and Pattern1End.

Check the hold space to see if a flag has been set and if so delete the collection.

Otherwise, set the flag and print the collection.

Alternative:

sed '/Pattern1=File1/,/Pattern1End/{x;/Pattern1End/{x;d};x;h}' file
potong
  • 55,640
  • 6
  • 51
  • 83