This is a followup to this question.
Have a look at this pattern:
(o(?1)?o)
It matches any sequence of o
with a length of 2n, with n ≥ 1.
It works, see regex101.com (word boundaries added for better demonstration).
The question is: Why?
In the following, the description of a string (match or not) will simply be a bolded number or a bolded term that describes the length, like 2n.
Broken down (with added whitespaces):
( o (?1)? o )
( ) # Capture group 1
o o # Matches an o each at the start and the end of the group
# -> the pattern matches from the outside to the inside.
(?1)? # Again the regex of group 1, or nothing.
# -> Again one 'o' at the start and one at the end. Or nothing.
I don't understand why this doesn't match 2n, but 2n, because I would describe the pattern as *an undefined number of o o
, stacked into each other.
Visualization:
No recursion, 2 is a match:
oo
One recursion, 4 is a match:
o o
oo
So far, so easy.
Two recursions. Obviously wrong because the pattern does not match 6:
o o
o o
oo
But why? It seems to fit the pattern.
I conclude that it's not simply the plain pattern that is repeated because otherwise 6 would have to match.
But according to regular-expressions.info:
(?P<name>[abc])(?1)(?P>name)
matches three letters like(?P<name>[abc])[abc][abc]
does.
and
[abc])(?1){3}
[...] is equivalent to([abc])[abc]{3}
So it does seem to simply rematch the regex code without an information about the previous match of the capture group.
Can someone explain and maybe visualize why this pattern matches 2n and nothing else?
Edit:
It was mentioned in the comments:
I doubt that referencing a capture group inside of itself is actually a supported case.
regular-expressions.info does mention the technique:
If you place a call inside the group that it calls, you'll have a recursive capturing group.