How do I match using :global in Raku grammar?

Question

I'm trying to write a Raku grammar that can parse commands that ask for programming puzzles.

This is a simplified version just for my question, but the commands combine a difficulty level with an optional list of languages.

Sample valid input:

No language: easy
One language: hard javascript
Multiple languages: medium javascript python raku

I can get it to match one language, but not multiple languages. I'm not sure where to add the :g.

Here's an example of what I have so far:

grammar Command {
    rule TOP { <difficulty> <languages>? }

    token difficulty { 'easy' | 'medium' | 'hard' }

    rule languages { <language>+ }
    token language { \w+ }
}

multi sub MAIN(Bool :$test) {
    use Test;
    plan 5;

    # These first 3 pass.
    ok Command.parse('hard', :token<difficulty>), '<difficulty> can parse a difficulty';

    nok Command.parse('no', :token<difficulty>), '<difficulty> should not parse random words';

    # Why does this parse <languages>, but <language> fails below?
    ok Command.parse('js', :rule<languages>), '<languages> can parse a language';

    # These last 2 fail.
    ok Command.parse('js', :token<language>), '<language> can parse a language';

    # Why does this not match both words? Can I use :g somewhere?
    ok Command.parse('js python', :rule<languages>), '<languages> can parse multiple languages';
}

This works, even though my test #4 fails:

my token wrd { \w+ }
'js' ~~ &wrd;  #=> ｢js｣

Extracting multiple languages works with a regex using this syntax, but I'm not sure how to use that in a grammar:

'js python' ~~ m:g/ \w+ /;  #=> (｢js｣ ｢python｣)

Also, is there an ideal way to make the order unimportant so that difficulty could come anywhere in the string? Example:

rule TOP { <languages>* <difficulty> <languages>? }

Ideally, I'd like for anything that is not a difficulty to be read as a language. Example: raku python medium js should read medium as a difficulty and the rest as languages.

user0721090601 · Accepted Answer · 2020-12-05T12:14:04.463

There are two things at issue here.

To specify a subrule in a grammar parse, the named argument is always :rule, regardless whether in the grammar it's a rule, token, method, or regex. Your first two tests are passing because they represent valid full-grammar parses (that is, TOP), as the :token named argument is ignored since it's unknown.

That gets us:

ok  Command.parse('hard',      :rule<difficulty>), '<difficulty> can parse a difficulty';
nok Command.parse('no',        :rule<difficulty>), '<difficulty> should not parse random words';
ok  Command.parse('js',        :rule<languages> ), '<languages> can parse a language';
ok  Command.parse('js',        :rule<language>  ), '<language> can parse a language';
ok  Command.parse('js python', :rule<languages> ), '<languages> can parse multiple languages';

# Output
ok 1 - <difficulty> can parse a difficulty
ok 2 - <difficulty> should not parse random words
ok 3 - <languages> can parse a language
ok 4 - <language> can parse a language
not ok 5 - <languages> can parse multiple languages

The second issue is how implied whitespace is handled in a rule. In a token, the following are equivalent:

token foo { <alpha>+  }
token bar { <alpha> + }

But in a rule, they would be different. Compare the token equivalents for the following rules:

rule  foo { <alpha>+       } 
token foo { <alpha>+ <.ws> }

rule  bar { <alpha> +         }
token bar { [<alpha> <.ws>] + }

In your case, you have <language>+, and since language is \w+, it's impossible to match two (because the first one will consume all the \w). Easy solution though, just change <language>+ to <language> +.

To allow the <difficulty> token to float around, the first solution that jumps to my mind is to match it and bail in a <language> token:

token language { <!difficulty> \w+ }

<!foo> will fail if at that position, it can match <foo>. This will work almost perfect until you get a language like 'easyFoo'. The easy fix there is to ensure that the difficulty token always occurs at a word boundary:

token difficulty {
   [
   | easy
   | medium
   | hard
   ]
   >> 
}

where >> asserts a word boundary on the right.

Thanks, I'm going to add these changes to my program in the morning. — R891, Dec 05 '20 at 12:23
See [When is white space really important in Raku grammars?](https://stackoverflow.com/questions/48892306/when-is-white-space-really-important-in-perl6-grammars) for discussion that elaborates on both the issues that @user0721090601 explains underlie all the failures you were having. The fact that unhandled named arguments are ignored, which has practical and strategic evolutionary benefits, has the downside that it's currently done without a warning. For now, that downside is just something you need to be aware of. Aiui, Raku, Rakudo, and/or CommaIDE may provide relief in years to come. — raiph, Dec 05 '20 at 12:39

How do I match using :global in Raku grammar?

1 Answers1