7

I'm trying to build a regex that matches exactly two occurrences of a char in a class. This is the regex I made:

(?<!\1)([^raol1c])\1(?!\1)

As you can see, it uses negatives look-aheads and behind. But, as usual the latter does not work; java throws the well-known exception "look-behind group does not have an obvious maximum length" when it clearly has a maximum length (exactly one char).

Ideally the regex should match "hh", "jhh", "ahh", "hhj", "hha" but not "hhh".

Any ideas about how to deal with this and make a workaround?

greedybuddha
  • 7,488
  • 3
  • 36
  • 50
  • Is there a set length for the strings you are checking? – Andrew Clark May 20 '13 at 16:27
  • No, strings could be of any length. – user2402372 May 20 '13 at 16:30
  • I don't understand your rule: should `"hhaa"` match? – jlordo May 20 '13 at 16:42
  • Yes. Because the regex match "hh" then look ahead for a negative match of "h" and looks behind for a negative match of "h" what is true in both cases. – user2402372 May 20 '13 at 16:45
  • For a really simple workaround beyond a single regex, search for `([^raol1c])\1+` and check that the result is not more than two characters long. – Martin Ender May 20 '13 at 16:51
  • possible duplicate of [Backreferences in lookbehind](http://stackoverflow.com/questions/2734977/backreferences-in-lookbehind) – Martin Ender May 20 '13 at 17:01
  • Note that even if backreferences could be directly used inside lookbehinds, this regex would not work, because at the time `(?<!\1)` would be evaluated, `\1` is not yet captured. It would need to be `([^raol1c])(?<!\1.)\1(?!\1)`, which does work in PCRE2. – Deadcode Dec 07 '19 at 07:30

1 Answers1

6

Here is a workaround. It's ugly but apparently it works:

(?<!(?=\1).)([^raol1c])\1(?!\1)

Putting the backreference into a zero-length lookahead inside the lookbehind makes the lookbehind certainly of fixed length.

Disclaimer, I did not come up with this (unfortunately): Backreferences in lookbehind

EDIT:

The above pattern does not rule out hhh for some reason. However, this works:

(?<!(.)(?=\1))([^raol1c])\2(?!\2)

If we create the first group inside the lookbehind then we can use this to ensure that the first character after the lookbehind is not the same as the one before it.

Working demo.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130