1

In my grammar, I want to have both "variable identifiers" and "function identifiers". Essentially, I want to be less restrictive on the characters allowed in function identifiers. However, I am running in to the issue that all variable identifiers are valid function identifiers.

As an example, say I want to allow uppercase letters in a function identifier but not in a variable identifier. My current (presumably naive) might look like:

prog : 'func' FunctionId
     | 'var' VariableId
     ;

FunctionId : [a-zA-Z]+ ;
VariableId : [a-z]+ ;

With the above rules, var hello fails to parse. If I understand correctly, this is because FunctionId is defined first, so "hello" is treated as a FunctionId.

Can I make antlr choose the more specific valid rule?

Panda
  • 877
  • 9
  • 21

2 Answers2

1

An explanation why your grammar does not work as expected could be found here.

You can solve this with semantic predicates:

grammar Test;

prog : 'func' functionId
     | 'var' variableId
     ;

functionId : Id;
variableId : {isVariableId(getCurrentToken().getText())}? Id ;

Id : [a-zA-Z]+; 

On the lexer level there will be only ids. On the parser level you can restrict an id to lowercase characters. isVariableId(String) would look like:

public boolean isVariableId(String text) {
    return text.matches("[a-z]+");
}
CoronA
  • 7,717
  • 2
  • 26
  • 53
1

Can I make antlr choose the more specific valid rule?

No (as already mentioned). The lexer merely matches as much as it can, and in case 2 or more rules match the same, the one defined first "wins". There is no way around this.

I'd go for something like this:

prog : 'func' functionId
     | 'var' variableId
     ;

functionId : LowerCaseId | UpperCaseId ;
variableId : LowerCaseId ;

LowerCaseId : [a-z]+ ;
UpperCaseId : [A-Z] [a-zA-Z]* ;
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • To be closer to the problem: `UpperCaseId : [A-Za-z]+`. The definition sequence (`LowerCaseId` before `UpperCaseId`) prevents disambiguities. – CoronA Apr 15 '18 at 08:16