6

I'm implementing a new DSL in Marpa and (coming from Regexp::Grammars) I'm more than satisfied. My language supports a bunch of unary and binary operators, objects with C-style identifiers and method calls using the familiar dot notation. For example:

foo.has(bar == 42 AND baz == 23)

I found the prioritized rules feature offered by Marpa's grammar description language and have come to rely on that a lot, so I have nearly only one G1 rule Expression. Excerpt (many alternatives, and semantic actions omitted for brevity):

Expression ::=
      NumLiteral
    | '(' Expression ')'             assoc => group
   || Expression ('.') Identifier
   || Expression ('.') Identifier Args
    | Expression ('==') Expression
   || Expression ('AND') Expression

Args     ::= ('(') ArgsList (')')
ArgsList ::= Expression+             separator => [,]

Identifier         ~ IdentifierHeadChar IdentifierBody
IdentifierBody     ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]

NumLiteral ~ [0-9]+

As you can see, I'm using the Scanless interface (SLIF). My problem is that this also parses, for example:

foo.AND(5)

Marpa knows that there can only be an identifier after a dot, so it doesn't even consider the fact that AND might be a keyword. I know that I can avoid that problem by doing a separate lexing stage that identifies AND as a keyword explicitly, but that tiny papercut is not quite worth the effort.

Is there a way in SLIF to restrict the Identifier rule to non-keyword identifiers only?

Stefan Majewsky
  • 5,427
  • 2
  • 28
  • 51
  • What do you mean by "keyword"? `assoc` and `separator` are keywords in the Marpa lingo. – choroba Nov 24 '14 at 17:04
  • @choroba, He means that if he defines an operator `AND`, he doesn't want it to be allowed as an identifier. – ikegami Nov 24 '14 at 17:13
  • I haven't tested this, but you might want to look at the 'latm' adverb. This allows you to turn off Marpa's knowledge of what lexeme is acceptable where, on a per-lexeme basis -- in effect making it "stupid" for that one lexeme, so that it will think that an 'AND' is OK, and then fail the parse, as you want. Couple this perhaps with a higher lexeme priority so that 'AND' as an operator is preferred over 'AND' as an identifier. Off the top of my head, but hope it helps. – Jeffrey Kegler Nov 25 '14 at 00:41

2 Answers2

2

I don't know how to express such a thing in the grammar. You can introduce an intermediate non-terminal for Identifier which would check the condition, though:

#!/usr/bin/perl
use warnings;
use strict;
use Syntax::Construct qw{ // };

use Marpa::R2;

my %reserved = map { $_ => 1 } qw( AND );

my $grammar = 'Marpa::R2::Scanless::G'->new(
    { bless_package => 'main',
      source => \( << '__GRAMMAR__'),

:default ::= action => store

:start ::= S
S ::= Id
  | Id NumLiteral
Id ::= Identifier action => allowed

Identifier         ~ IdentifierHeadChar IdentifierBody
IdentifierBody     ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]

NumLiteral ~ [0-9]+

:discard ~ whitespace
whitespace ~ [\s]+

__GRAMMAR__
    });

for my $value ('ABC', 'ABC 42', 'AND 1') {
    my $value = $grammar->parse(\$value, 'main');
    print $$value, "\n";
}


sub store {
    my (undef, $id, $arg) = @_;
    $arg //= 'null';
    return "$id $arg";
}

sub allowed {
    my (undef, $id) = @_;
    die "Reserved keyword $id" if $reserved{$id};
    return $id
}
choroba
  • 231,213
  • 25
  • 204
  • 289
  • oops, doh! [will self-destruct] – ikegami Nov 24 '14 at 18:21
  • I tried various permutations of the `priority` and `latm` lexeme adverbs, but this is the only thing that actually made my unit test green. The error messages might not be as pretty, but at least it accepts the grammar correctly. – Stefan Majewsky Nov 25 '14 at 14:12
  • @StefanMajewsky: A "negative rule" would be nice to have. Thanks for an interesting question. – choroba Nov 25 '14 at 14:17
  • There might be more to come where that one came from. ;) I might look into building an autocompleting code editor for this DSL. – Stefan Majewsky Nov 25 '14 at 14:32
2

You can use lexeme priorities intended just for such kind of thing, the example is here in Marpa::R2 test suite.

Basically, you declare <AND keyword> ~ 'AND' lexeme and give it priority 1 so that it's preferred over Identifier. That must do the trick.

P.S. I modified the above script slightly to give an example — code, output.

rns
  • 771
  • 4
  • 9
  • Actions can be used, but that is not terribly efficient -- actions are called at the evaluation phase when the input is read. Events are better, see this gist -- https://gist.github.com/rns/d19b40ffc5523659dec9 -- `AND` identifier is rejected once it is met in the input. – rns May 08 '15 at 10:00
  • re [`$r->literal()`](https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless/R.pod#literal) -- yes, it can be used to access any input span, if you needed it. – rns May 12 '15 at 10:22
  • Using events to issue a warning asking for spaces around `-` in `12 34-56 78` needs more work --https://gist.github.com/rns/962fdb4f30d0681cc07d -- it uses significant spaces (no `:discard`), marker symbols and nulled events, see https://metacpan.org/pod/distribution/Marpa-R2/pod/Event.pod – rns May 12 '15 at 11:08