2

Here is what I am trying to make an AST of it:

{{ name }}
{{ name | option }}
{{ name | option1 | option2 }}
{{ name | key=value }}
{{ name | option1 | key=value }}
{{ name | option1 | {{ another }} | option3 }}

So in practice there is always a name (a..zA..Z0..9) and options sometimes are in key-value format and sometimes in simple and without value format.

I am trying to write a lexer/parser grammar for it by ANTLR but it keeps nagging about different stuff. Here is my best shot:

start   :   box+;
box :   '{{' Name  ('|'  Options )* '}}';
Options :   (SimpleOption | KeyValue | box);
Name    :   ID;
SimpleOption:   ID;
KeyValue:       ID '=' ID;
fragment
 ID  :  ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
WS  :   ( ' ' | '\t' | '\r' | '\n'  {$channel=HIDDEN;}  ;

Which is obviously wrong because Name and SimpleOption are ambiguous. Even an inline rule is useless:

box :   '{{' Name  ('|'  (ID | KeyValue | box) )* '}}';

Because it never picks KeyValue up and gives a Mismatch exception on the encounter with '='.

How would you write this grammar?

el_shayan
  • 2,735
  • 4
  • 28
  • 42
  • `Name` shouldn't be ambiguous because an ID will always be reduced to that if it is the first token after `{{`, and never otherwise. The answer given looks like the ambiguity to me. – Lucero Jul 04 '12 at 22:04

2 Answers2

4

You're using way too much lexer rules. The rule KeyValue will only match ID '=' ID without spaces around the = sign: it should be a parser rule (start with a lower case letter). Only when it's a parser rule, it can have spaces around the =, which will get discarded then.

Be sure you understand the difference between lexer- and parser rules! See: Practical difference between parser rules and lexer rules in ANTLR?

This should do it:

grammar T;

start     : box+ EOF;
box       : '{{' ID ('|' opts)* '}}';
opts      : key_value | ID | box; // note that 'options' is a reserved word in ANTLR!
key_value : ID '=' ID;
ID        : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*;
WS        : (' ' | '\t' | '\r' | '\n') {skip();};

which would parse the input

{{ name | option1 = value1 | {{ another | k=v }} | option3 }}

as follows:

enter image description here

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
0

Does this work for you:

Options :   (SimpleOptionOrKeyValue | box);
SimpleOptionOrKeyValue:   ID ( '=' ID | );

This eliminates the need for lookahead for the = sign. (Edited to reverse order of appearance inside the parens, not sure how ANTLR handles this.)

The distinction between simple option and key-value can then be carried out at the semantic level.

Perhaps related: ANTLR How to use lexer rules having same starting?

Community
  • 1
  • 1
krlmlr
  • 25,056
  • 14
  • 120
  • 217
  • "Can't look backwards more than one token" is the Antlr's answer to this solution :( - but thanks for the links – el_shayan Jul 04 '12 at 22:32
  • Does it work then if you create a new rule for `'=' ID`? Like: `ValueForKeyValue: ('=' ID);` – krlmlr Jul 05 '12 at 06:23
  • No offence, but this answer is wrong and should be removed (no idea why it got upvoted). `SimpleOptionOrKeyValue` is still a lexer rule, while it should be a parser rule, and ANTLR wouldn't have an issue looking ahead to see if there's a `=` or not: ANTLR's parser is `LL(*)`. – Bart Kiers Jul 05 '12 at 17:39
  • @BartKiers: Claiming an answer to be "wrong" is bold. It's not elegant, yes. There are better ways, yes. I haven't ever used ANTLR and don't know the difference between lowercase and uppercase rules, yes. But is the language regular? Yes. Can it then be handled by the lexer alone provided that it can handle regular languages? Yes. Can the lexer handle regular languages? No idea. Does this qualify for a "wrong" answer? – krlmlr Jul 05 '12 at 18:30
  • Look, the OP is asking why things go wrong in his grammar, you answer his question (containing misleading information) but this answer of yours does not resolve the OP's problem(s) (the problem is the over-use of lexer rules). I don't know what else to call an answer that does not solve the issue at hand but "wrong"... But, fair enough, let me restate that: the answer is IMO wrong. It does not solve the OP's problem and implies that ANTLR needs extra lookahead to "see" past the key-value rule. – Bart Kiers Jul 05 '12 at 18:41