5

I want to find instances where a captured group does not appear later in the string:

aaaBbb  = CccBbb  <- format is valid, skip
aaaDddd = CccDddd <- format is valid, skip
aaaEeee = CccFfff <- format is not valid, match this one only

So this matches the lines I don't want to match ( https://regex101.com/r/lon87L/1 )

/^ +\w+([A-Z][a-z+]) += +\w+\1$/mg

I've read on https://www.regular-expressions.info/refadv.html that php doesn't support backreferences inside a negative lookbehind, but other implementations of regex can. So something like this would match the invalid lines that I want to match, but it doesn't work in php:

/^ +\w+([A-Z][a-z+]) += +\w+(?<!\1)$/mg

Is there anything else that would work, other than matching all of three lines and looping through the matches in a php foreach?

Redzarf
  • 2,578
  • 4
  • 30
  • 40
  • Negative lookbehinds require a compile time fixed length. A backreference is a runtime item with variable length. One option is to `(?>\1(*SKIP)(*FAIL)|\w)+` and match the backreference. This is probably quicker too. –  Nov 15 '18 at 22:19
  • You can see it here https://regex101.com/r/6gfSBi/1 Btw, only the Dot-Net engine supports variable width lookbehinds (including backreferences). –  Nov 15 '18 at 22:22
  • If it has to be at the EOS, just add a `$` after the backref https://regex101.com/r/QuXJLY/1 –  Nov 15 '18 at 22:27

2 Answers2

2

Try using using a negative lookahead instead of a negative lookbehind. It works equally well, plus it works in PHP.

^ +\w+([A-Z][a-z]+) += +(?!\w+\1).*$

regex101 demo

PHP demo

Ethan
  • 4,295
  • 4
  • 25
  • 44
1

One option would be to, right before each repeated \w after the =, use negative lookahead for \1$:

^ +\w+([A-Z][a-z]+) += +(?:(?!\1$)\w)+$
                        ^^^^^^^^^^^^^^

https://regex101.com/r/lon87L/2

But that only excludes a match if the backreference occurs right at the end of the string. If you want to ensure that the previously matched phrase doesn't occur anywhere within the final \ws, just remove the $ from inside the repeated group:

^ +\w+([A-Z][a-z]+) += +(?:(?!\1)\w)+$
                                ^

https://regex101.com/r/lon87L/3

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320