I am trying to resolve how to handle ambiguities in ANTLR
.
I need to parse identifiers or identifiers with size prefix correctly.
First I came up to this buggy grammar
grammar PrefixProblem;
options
{
language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: LETTER+;
LETTER: 'A'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
I need to handle B
as ID
, B:B
as id
B
with prefix B
. It didn't work.
Then I found two solutions to this problem.
grammar PrefixSolution1;
options
{
language = Java;
}
goal: (size_prefix ':')? id;
size_prefix: B;
id: (LETTER | B)+;
LETTER: 'A' | 'C'..'Z' ;
B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
In the code above B
was removed from a lexer
rule and concatenated in id
rule.
grammar PrefixSolution2;
options
{
language = Java;
}
goal: PREFIX_VAR;
PREFIX_VAR: (B WSFULL* ':' WSFULL*)? ID;
fragment ID: (LETTER)+;
fragment LETTER: 'A'..'Z' ;
fragment B: 'B';
WSFULL:(' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
Here I just moved a rule to the lexer
.
PrefixSolution1
has the main con that I need to stripe lexer rules into smaller chunks and then concatecate later.
PrefixSolution2
: this approach leads that i need always to take an account white space characters which should be ignored.
As for me both solutions will lead to a big mess writing a grammar for a whole language. Is there any other solution? If not, which way is the most optimal one?
All source code is available here