0

I can't figure out why \d+ never gets to the point 12 but tries further to 56.

the regex (?=(\d+))\w+\1 never matches 123x12. First the lookaround captures 123 into \1. \w+ then matches the whole string and backtracks until it matches only 1. Finally, \w+ fails since \1 cannot be matched at any position. Now, the regex engine has nothing to backtrack to, and the overall regex fails. The backtracking steps created by \d+ have been discarded. It never gets to the point where the lookahead captures only 12.

Obviously, the regex engine does try further positions in the string. If we change the subject string, the regex (?=(\d+))\w+\1 does match 56x56 in 456x56.

https://www.regular-expressions.info/lookaround.html#:~:text=Lookaround%20Is%20Atomic

so far as I've understood, the lookaround does nothing but get a immutable capturing group at first, as "the regex engine forgets about everything inside the lookaround", and immediately passing it to "\1", in the first case 123x12, \1 is 123, and 456 for second case.

Sam
  • 1
  • Isn't it obvious? The regex engine parses the string from left to right. `\d+` is greedy, so `123` != `12`, `23` then != `12`, failure occurs. In case of `56x56`, `56` == `56`. Besides, remember that lookahead patterns are atomic, i.e. once the value is calculated, it is not re-calculated. In PCRE2, there are non-atomic lookaheads, and then [`(?*(\d+))\w+\1` works](https://regex101.com/r/y5xzPR/2). – Wiktor Stribiżew Feb 24 '23 at 11:36
  • So, it is also - again - a question of choosing the regex flavor. All regex questions must have a tag specifying the regex flavor/library. Regex does not always work the same across languages/libraries. – Wiktor Stribiżew Feb 24 '23 at 12:09

0 Answers0