7

Not sure whether grammars are meant to do such things: I want tokens to be defined in runtime (in future — with data from a file). So I wrote a simple test code, and as expected it wouldn't even compile.

grammar Verb {
  token TOP {
    <root> 
    <ending>
  }
  token root {
    (\w+) <?{ ~$0 (elem) @root }>
  }
  token ending {
    (\w+) <?{ ~$0 (elem) @ending }>
  }
}

my @root = <go jump play>;
my @ending = <ing es s ed>;

my $string = "going";
my $match = Verb.parse($string);
.Str.say for $match<root>;

What's the best way of doing such things in Perl 6?

Eugene Barsky
  • 5,780
  • 3
  • 17
  • 40

2 Answers2

7

To match any of the elements of an array, just write the name of the array variable (starting with a @ sigil) in the regex:

my @root = <go jump play>;
say "jumping" ~~ / @root /;        # Matches 「jump」
say "jumping" ~~ / @root 'ing' /;  # Matches 「jumping」

So in your use-case, the only tricky part is passing the arrays from the code that creates them (e.g. by parsing data files), to the grammar tokens that need them.

The easiest way would probably be to make them dynamic variables (signified by the * twigil):

grammar Verb {
    token TOP {
        <root> 
        <ending>
    }
    token root {
        @*root
    }
    token ending {
        @*ending
    }
}

my @*root = <go jump play>;
my @*ending = <ing es s ed>;

my $string = "going";
my $match = Verb.parse($string);

say $match<root>.Str;

Another way would be to pass a Capture with the arrays to the args adverb of method .parse, which will pass them on to token TOP, from where you can in turn pass them on to the sub-rules using the <foo(...)> or <foo: ...> syntax:

grammar Verb {
    token TOP (@known-roots, @known-endings) {
        <root: @known-roots>
        <ending: @known-endings>
    }
    token root (@known) {
        @known
    }
    token ending (@known) {
        @known
    }
}

my @root = <go jump play>;
my @ending = <ing es s ed>;

my $string = "going";
my $match = Verb.parse($string, args => \(@root, @ending));

say $match<root>.Str;  # go
smls
  • 5,738
  • 24
  • 29
  • Wow, it's absolutely marvelous, especially matching an array! – Eugene Barsky Oct 21 '17 at 09:02
  • Great suggestions! However, I'm having trouble finding documentation on the `` and `` syntax you mentioned. Would it be possible to post a link to such documentation? – Enheh Aug 15 '18 at 11:42
  • 1
    @Enheh: It seems to be missing from p6doc, unfortunately... :/ But the original design docs mention it: https://design.perl6.org/S05.html#line_1490 – smls Aug 23 '18 at 21:34
2

The approach you were taking could have worked but you made three mistakes.

Scoping

Lexical variable declarations need to appear textually before the compiler encounters their use:

my $foo = 42; say $foo; # works
say $bar; my $bar = 42; # compile time error

Backtracking

say .parse: 'going' for

  grammar using-token              {token TOP {         \w+ ing}}, # Nil
  grammar using-regex-with-ratchet {regex TOP {:ratchet \w+ ing}}, # Nil
  grammar using-regex              {regex TOP {         \w+ ing}}; # 「going」

The regex declarator has exactly the same effect as the token declarator except that it defaults to doing backtracking.

Your first use of \w+ in the root token matches the entire input 'going', which then fails to match any element of @root. And then, because there's no backtracking, the overall parse immediately fails.

(Don't take this to mean that you should default to using regex. Relying on backtracking can massively slow down parsing and there's typically no need for it.)

Debugging

See https://stackoverflow.com/a/19640657/1077672


This works:

my @root = <go jump play>;
my @ending = <ing es s ed>;

grammar Verb {
  token TOP {
    <root> 
    <ending>
  }
  regex root {
    (\w+) <?{ ~$0 (elem) @root }>
  }
  token ending {
    (\w+) <?{ ~$0 (elem) @ending }>
  }
}

my $string = "going";
my $match = Verb.parse($string);

.Str.say for $match<root>;

outputs:

go
raiph
  • 31,607
  • 3
  • 62
  • 111
  • 1
    that's a great comment! After many tests today with different `tokens` of `\w+` type followed by an ending, I finally figured out that it won't match without backtracking and changed `token` to `regex`. – Eugene Barsky Oct 21 '17 at 17:46
  • 1
    @evb Note one can write either `regex root { ` or `token root { :!ratchet `. They mean exactly the same thing. – raiph Oct 21 '17 at 21:42