Match most specific rule

Question

In my grammar, I want to have both "variable identifiers" and "function identifiers". Essentially, I want to be less restrictive on the characters allowed in function identifiers. However, I am running in to the issue that all variable identifiers are valid function identifiers.

As an example, say I want to allow uppercase letters in a function identifier but not in a variable identifier. My current (presumably naive) might look like:

prog : 'func' FunctionId
     | 'var' VariableId
     ;

FunctionId : [a-zA-Z]+ ;
VariableId : [a-z]+ ;

With the above rules, var hello fails to parse. If I understand correctly, this is because FunctionId is defined first, so "hello" is treated as a FunctionId.

Can I make antlr choose the more specific valid rule?

score 1 · Answer 1 · answered Apr 15 '18 at 05:48

An explanation why your grammar does not work as expected could be found here.

You can solve this with semantic predicates:

grammar Test;

prog : 'func' functionId
     | 'var' variableId
     ;

functionId : Id;
variableId : {isVariableId(getCurrentToken().getText())}? Id ;

Id : [a-zA-Z]+;

On the lexer level there will be only ids. On the parser level you can restrict an id to lowercase characters. isVariableId(String) would look like:

public boolean isVariableId(String text) {
    return text.matches("[a-z]+");
}

Bart Kiers · Answer 2 · 2018-04-15T08:31:32.050

1

Can I make antlr choose the more specific valid rule?

No (as already mentioned). The lexer merely matches as much as it can, and in case 2 or more rules match the same, the one defined first "wins". There is no way around this.

I'd go for something like this:

prog : 'func' functionId
     | 'var' variableId
     ;

functionId : LowerCaseId | UpperCaseId ;
variableId : LowerCaseId ;

LowerCaseId : [a-z]+ ;
UpperCaseId : [A-Z] [a-zA-Z]* ;

edited Apr 15 '18 at 08:31

answered Apr 15 '18 at 07:54

Bart Kiers

166,582
36
299
288

To be closer to the problem: `UpperCaseId : [A-Za-z]+`. The definition sequence (`LowerCaseId` before `UpperCaseId`) prevents disambiguities. – CoronA Apr 15 '18 at 08:16

Match most specific rule

2 Answers2