3

To clarify, I want to match:

ab
aabb
aaabbb
...

This works in Perl:

if ($exp =~ /^(a(?1)?b)$/)

To understand this, look at the string as though it grows from the outside-in, not left-right:

ab
a(ab)b
aa(ab)bb

(?1) is a reference to the outer set of parentheses. We need the ? afterwards for the last case (going from outside in), nothing is left and ? means 0 or 1 of the preceding expression (so it essentially acts as our base case).

I posted a similar question asking what is the equivalent (?1) in Java? Today I found out that \\1 refers to the first capturing group. So, I assumed that this would work:

String pattern = "^(a(?:\\1)?b)$";

but it did not. Does anyone know why?

NB: I know there are other, better, ways to do this. This is strictly an educational question. As in I want to know why this particular way does not work and if there is a way to fix it.

Steve P.
  • 14,489
  • 8
  • 42
  • 72
  • 2
    Does `(?1)` mean "match something of the form specified by the regex in the first set of parentheses", or "match the same thing the first capture group matched"? `\\1` means the second thing, but the only way I could see that Perl regex working is if `(?1)` meant the first thing. – user2357112 Jul 10 '13 at 00:20
  • 1
    I found this note in the Perl RE [documentation](http://perldoc.perl.org/perlre.html): "Note that this pattern does not behave the same way as the equivalent PCRE or Python construct of the same form. In Perl you can backtrack into a recursed group, in PCRE and Python the recursed into group is treated as atomic." Not sure if this is the same thing, but it could explain why this doesn't work the same way in Perl and Java. – ajb Jul 10 '13 at 00:36

1 Answers1

1

The \\1 is a backreference and refers to the value of the group, not to the pattern as the recursion (?1) does in Perl. Unfortunately, Java regexes do not support recursion, but the pattern can be expressed using lookarounds and backrefs.

amon
  • 57,091
  • 2
  • 89
  • 149