10

I've been trying to figure out how to do a recursive regular expression in Perl 6. For a toy example, a balanced parentheses matcher, which would match ((())()) inside (((((())()).

  • PCRE example: /\((?R)?\)/

  • Onigmo example: (?<paren>\(\g<paren>*\))

I thought this would do it:

my regex paren {
  '(' ~ ')' <paren>*
}

or the simpler

my regex paren {
  '(' <paren>* ')'
}

but that fails with

No such method 'paren' for invocant of type 'Match'
in regex paren at ...
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • 1
    See also [Parsing a possibly nested braced item using a grammar](https://stackoverflow.com/q/47124405/2173773) – Håkon Hægland Mar 01 '19 at 09:52
  • @HåkonHægland: Thanks, especially that [link](https://examples.perl6.org/categories/best-of-rosettacode/balanced-brackets.html) was a nice find. However, I was explicitly trying not to look at grammars since I want to find all matching spans, not parse a string from start, and I don't think grammars support that. That said, I am a noob at P6, so I am sure I'm missing something. – Amadan Mar 01 '19 at 09:58
  • 1
    @HåkonHægland I mean I guess I can make a grammar that has `nonparen` as stuff I don't want, and an action class that will collect the `paren` matches... but that gets complicated fast... It's just very hard to believe P6 regular expressions dropped support for something Perl basically pioneered. – Amadan Mar 01 '19 at 10:02

2 Answers2

15

You need to make explicit that you're calling a my-scoped regex:

my regex paren {
    '(' ~ ')' <&paren>*
}

Notice the & that has been added. With that:

say "(()())" ~~ /^<&paren>$/    # 「(()())」
say "(()()" ~~ /^<&paren>$/     # Nil

While it's true that you can sometimes get away without explicitly writing the &, and indeed could when using it:

say "(()())" ~~ /^<paren>$/    # 「(()())」
say "(()()" ~~ /^<paren>$/     # Nil

This only works because the compiler spots there is a regex defined in the lexical scope with the name paren so compiles the <paren> syntax into that. With the recursive case, the declaration isn't installed until after the regex is parsed, so one needs to be explicit.

raiph
  • 31,607
  • 3
  • 62
  • 111
Jonathan Worthington
  • 29,104
  • 2
  • 97
  • 136
2

You can use the ~~ in the meta-syntax to make a recursive callback into the current pattern or just a part of it. For example, you can match balanced parenthesis with the simple regex:

say "(()())" ~~ /'(' <~~>* ')'/;    # 「(()())」
say "(()()"  ~~ /'(' <~~>* ')'/;    # 「()」

Try it online!

Unfortunately, matching via a captured subrule (like ~~0) is not yet implemented.

Jo King
  • 590
  • 3
  • 17