1

I am trying to resolve how to handle ambiguities in ANTLR. I need to parse identifiers or identifiers with size prefix correctly. First I came up to this buggy grammar

grammar PrefixProblem;
options       
{   
    language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: LETTER+;
LETTER: 'A'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

I need to handle B as ID, B:B as id B with prefix B. It didn't work.

Then I found two solutions to this problem.

grammar PrefixSolution1;
options       
{   
    language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: (LETTER | B)+;
LETTER: 'A' | 'C'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

In the code above B was removed from a lexer rule and concatenated in id rule.

grammar PrefixSolution2;
options       
{   
    language = Java;
}
goal: PREFIX_VAR;
PREFIX_VAR: (B WSFULL* ':' WSFULL*)? ID;
fragment ID: (LETTER)+;
fragment LETTER: 'A'..'Z' ;
fragment B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

Here I just moved a rule to the lexer.

PrefixSolution1 has the main con that I need to stripe lexer rules into smaller chunks and then concatecate later.

PrefixSolution2: this approach leads that i need always to take an account white space characters which should be ignored.

As for me both solutions will lead to a big mess writing a grammar for a whole language. Is there any other solution? If not, which way is the most optimal one?

All source code is available here

Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107
Overdose
  • 1,470
  • 3
  • 16
  • 29
  • Ok so you will need something like this: "B" and something like this: "B:B", and in AST when "B" occurs you need to identify it as ID and when "B:B" occurs you need to identify it as ID with Prefix? Did I got it right – sm13294 Apr 26 '12 at 11:35

2 Answers2

1

I wouldn't go with either of them. I'd simply create ID tokens and not B tokens (or create PREFIX_VAR tokens: this belongs in the parser).

You can match a capital B (capB) using a disambiguating semantic predicate1 in a parser rule like this:

grammar Test;

goal
 : (prefixVar | ID)+ EOF
 ;

prefixVar
 : capB ':' ID 
 ;

capB
 : {input.LT(1).getText().equals("B")}? ID
 ;

ID : LETTER+;
WS : (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};

fragment LETTER: 'A'..'Z' ;

which would parse the input B:B B B:C into the following parse tree:

enter image description here

1 What is a 'semantic predicate' in ANTLR?

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
0

Try this one:

grammar PrefixProblem;


options       
{   
language = Java;
}

 goal: (size_prefix ':')? (id|B);

size_prefix: B;

id: LETTER+;

LETTER: 'A'|'C'..'Z' ;

B: 'B';

WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
sm13294
  • 563
  • 7
  • 23
  • Actually, it has the same con as PrefixSolution1, it requires striping the lexer, which i want to avoid – Overdose Apr 26 '12 at 11:45
  • Oh yes, sorry i just realized that I came to the same solution as 2nd one. I will try to do something now – sm13294 Apr 26 '12 at 11:47