5

Given the following input

$ cat pre
stuff MACRO1 stuff MACRO2
stuff MACRO2 stuff MACRO1
stuff MACRO2 stuff

I want to replace MACRO2 (with MACRO3) if MACRO1 also exists. Like so:

$ perl -ne '/(?=.*MACRO1).*MACRO2/ ? print s/MACRO2/MACRO3/gr : print' pre
stuff MACRO1 stuff MACRO3
stuff MACRO3 stuff MACRO1
stuff MACRO2 stuff

(I imagine the .*MACRO2 part of this expression is unnecessary, now that I think about it) Edit. A less stupid version of the above based on feedback so far:

$ perl -ne '/MACRO1/ ? print s/MACRO2/MACRO3/gr : print' pre

What I am trying to figure out is how to do it with just a regex. Here is one attempt:

$ perl -ne 'print s/(?=.*MACRO1)(?=.*MACRO2)MACRO2/MACRO3/gr' pre
stuff MACRO1 stuff MACRO2
stuff MACRO3 stuff MACRO1
stuff MACRO2 stuff

I think I am having some fundamental confusion about how a lookahead operator can be both an "anchor" and "non-consuming" at the same time. If I think about ?= as an anchor, it makes sense to me that the above doesn't work. But that would seem to contradict "non-consuming".

Can anyone define what is meant by non-consuming and show me a regex that would produce the desired results?

zzxyz
  • 2,953
  • 1
  • 16
  • 31
  • Re "*If I think about ?= as an anchor*", huh??? – ikegami Aug 18 '17 at 18:50
  • Why do you want to do it in just one regex? Breaking the code into easily-digested pieces will make maintaining it easier, faster, and cheaper. If you have trouble figuring it out, then everyone else will too. – shawnhcorey Aug 19 '17 at 13:51

2 Answers2

3

First, let's get the actual solution out there:

perl -pe's/MACRO2/MACRO3/g if /MACRO1/'

Now, let's look at your peculiar request. As a single substitution, it would look something like the following:

perl -pe's/MACRO2(?:(?<=MACRO1.*)|(?=.*MACRO1))/MACRO3/g'

Ignoring the fact that this doesn't work because variable-width lookbehinds aren't supported, this is incredibly inefficient. While the time required by the first solution I presented is bound by a factor of the size of the file, the time required by this solution is bound by a factor of the size of the file times a factor of the number of instances of MACRO2!

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • My reason for wanting to do it one substitution was mostly just to learn. If it's a very stupid approach for this problem, that's good enough for me. I'm still interested in understanding what "non-consuming" means. Does that part of my question make sense? – zzxyz Aug 18 '17 at 18:49
  • Non-consuming means the body of the lookaround is considered to have matched 0 characters from the perspective of the outside of the lookaround. e.g. `"ab" =~ /a(?=\w)b/` matches, while `"ab" =~ /a\wb/` wouldn't. – ikegami Aug 18 '17 at 18:56
  • 1
    Okay, I think this makes sense. I think what was throwing me off was the most upvoted answer here: https://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator . It's super-upvoted, so I am probably just misunderstanding that answer. It certainly doesn't work as I thought it would. – zzxyz Aug 18 '17 at 19:04
  • Compare `'def abc' =~ /^(?=.*abc)(?=.*def)/` and `'def abc' =~ /^.*abc.*def/`. `.*abc` matches 3 character, but `(?=.*abc)` matches zero characters, so `(?=.*def)` starts matching as position 0 too. – ikegami Aug 18 '17 at 19:12
  • Sigh...no, I still don't get it. I just do not understand how this operator works and no reading I do seems to clear it up. I do not get why `/(?=a)(?=b)/` doesn't match for `ab` OR `ba` – zzxyz Aug 18 '17 at 19:13
  • 2
    Does `/c(?=a)(?=b)/` make is clearer? Remember, all matches are preceded by an implicit `\A(?s:.)*?`. There is no position which is followed by `a` and which is followed by `b`. – ikegami Aug 18 '17 at 19:15
0

Then there's always:

$rec =~ s/^(.*)MACRO1(.*)MACRO2(.*)$/\1MACRO1\2MACRO3\3/;

O.k., per comment (what if MACRO2 comes first):

sub foo {
    my ($x) = @_;
    $x =~ s/2/3/;
    return $x;
}

Then:

$s3 =~ s/^(.*)((MACRO1(.*)MACRO2)|(MACRO2(.*)MACRO1))(.*)$/$1.($3?foo($3):foo($5))/e;

Of course, this requires the function (I called it foo).

Could possibly be done without a function, nesting match/replace in the replacement pattern to do the same thing, but I was getting a lot of syntax grief from Perl (my fault I'm sure).

davernator
  • 240
  • 1
  • 4
  • This is interesting and I juuuust figured it out. One problem is that it doesn't work if MACRO2 comes first. Is that fixable? – zzxyz Aug 18 '17 at 22:51
  • Of course, and probably in several ways., but (and hats off to you and ikegami ... it's been a while, this is fun) it requires a function: – davernator Aug 19 '17 at 01:13
  • First, hats off and thanks to you (zzxyz) and ikegami. It's been a while, and this is fun. – davernator Aug 19 '17 at 01:14
  • O.k., I put in the previous 2 comments, then added the further edit. Apologies for my newbiehood in comment editing. Learning ... – davernator Aug 19 '17 at 01:22