I am creating an antlr4 grammar for a moderately simple language. I am struggling to get the grammar to differentiate between unary and binary minus. I have read all the other posts that I can find on this topic here on Stackoverflow, but have found that the answers either apply to antlr3 in ways I cannot figure out how to express in antlr4, or that I seem not to be adept in translating the advice of these answers to my own situation. I often end with the problem that antlr cannot unambiguously resolve the rules if I play around with other alternatives.
Below is the antlr file in its entirety. The ambiguity in this version occurs around the production:
binop_expr
: SUMOP product
| product ( SUMOP product )*
;
(I had originally used UNARY_ABELIAN_OP instead of the second SUMOP, but that led to a different kind of ambiguity — the tool apparently couldn't recognise that it needed to differentiate between the same token in two different contexts. I mention this because one of the posts here recommends using a different name for the unary operator.)
grammar Kant;
program
: type_declaration_list main
;
type_declaration_list
: type_declaration
| type_declaration_list type_declaration
| /* null */
;
type_declaration
: 'context' JAVA_ID '{' context_body '}'
| 'class' JAVA_ID '{' class_body '}'
| 'class' JAVA_ID 'extends' JAVA_ID '{' class_body '}'
;
context_body
: context_body context_body_element
| context_body_element
| /* null */
;
context_body_element
: method_decl
| object_decl
| role_decl
| stageprop_decl
;
role_decl
: 'role' JAVA_ID '{' role_body '}'
| 'role' JAVA_ID '{' role_body '}' REQUIRES '{' self_methods '}'
| access_qualifier 'role' JAVA_ID '{' role_body '}'
| access_qualifier 'role' JAVA_ID '{' role_body '}' REQUIRES '{' self_methods '}'
;
role_body
: method_decl
| role_body method_decl
| object_decl // illegal
| role_body object_decl // illegal — for better error messages only
;
self_methods
: self_methods ';' method_signature
| method_signature
| self_methods /* null */ ';'
;
stageprop_decl
: 'stageprop' JAVA_ID '{' stageprop_body '}'
| 'stageprop' JAVA_ID '{' stageprop_body '}' REQUIRES '{' self_methods '}'
| access_qualifier 'stageprop' JAVA_ID '{' stageprop_body '}'
| access_qualifier 'stageprop' JAVA_ID '{' stageprop_body '}' REQUIRES '{' self_methods '}'
;
stageprop_body
: method_decl
| stageprop_body method_decl
;
class_body
: class_body class_body_element
| class_body_element
| /* null */
;
class_body_element
: method_decl
| object_decl
;
method_decl
: method_decl_hook '{' expr_and_decl_list '}'
;
method_decl_hook
: method_signature
| method_signature CONST
;
method_signature
: access_qualifier return_type method_name '(' param_list ')'
| access_qualifier return_type method_name
| access_qualifier method_name '(' param_list ')'
;
expr_and_decl_list
: object_decl
| expr ';' object_decl
| expr_and_decl_list object_decl
| expr_and_decl_list expr
| expr_and_decl_list /*null-expr */ ';'
| /* null */
;
return_type
: type_name
| /* null */
;
method_name
: JAVA_ID
;
access_qualifier
: 'public' | 'private' | /* null */
;
object_decl
: access_qualifier compound_type_name identifier_list ';'
| access_qualifier compound_type_name identifier_list
| compound_type_name identifier_list /* null expr */ ';'
| compound_type_name identifier_list
;
compound_type_name
: type_name '[' ']'
| type_name
;
type_name
: JAVA_ID
| 'int'
| 'double'
| 'char'
| 'String'
;
identifier_list
: JAVA_ID
| identifier_list ',' JAVA_ID
| JAVA_ID ASSIGN expr
| identifier_list ',' JAVA_ID ASSIGN expr
;
param_list
: param_decl
| param_list ',' param_decl
| /* null */
;
param_decl
: type_name JAVA_ID
;
main
: expr
;
expr
: block
| expr '.' message
| expr '.' CLONE
| expr '.' JAVA_ID
| ABELIAN_INCREMENT_OP expr '.' JAVA_ID
| expr '.' JAVA_ID ABELIAN_INCREMENT_OP
| /* this. */ message
| JAVA_ID
| constant
| if_expr
| for_expr
| while_expr
| do_while_expr
| switch_expr
| BREAK
| CONTINUE
| boolean_expr
| binop_expr
| '(' expr ')'
| <assoc=right> expr ASSIGN expr
| NEW message
| NEW type_name '[' expr ']'
| RETURN expr
| RETURN
;
relop_expr
: sexpr RELATIONAL_OPERATOR sexpr
;
// This is just a duplication of expr. We separate it out
// because a top-down antlr4 parser can't handle the
// left associative ambiguity. It is used only
// for abelian types.
sexpr
: block
| sexpr '.' message
| sexpr '.' CLONE
| sexpr '.' JAVA_ID
| ABELIAN_INCREMENT_OP sexpr '.' JAVA_ID
| sexpr '.' JAVA_ID ABELIAN_INCREMENT_OP
| /* this. */ message
| JAVA_ID
| constant
| if_expr
| for_expr
| while_expr
| do_while_expr
| switch_expr
| BREAK
| CONTINUE
| '(' sexpr ')'
| <assoc=right> sexpr ASSIGN sexpr
| NEW message
| NEW type_name '[' expr ']'
| RETURN expr
| RETURN
;
block
: '{' expr_and_decl_list '}'
| '{' '}'
;
expr_or_null
: expr
| /* null */
;
if_expr
: 'if' '(' boolean_expr ')' expr
| 'if' '(' boolean_expr ')' expr 'else' expr
;
for_expr
: 'for' '(' object_decl boolean_expr ';' expr ')' expr // O.K. — expr can be a block
| 'for' '(' JAVA_ID ':' expr ')' expr
;
while_expr
: 'while' '(' boolean_expr ')' expr
;
do_while_expr
: 'do' expr 'while' '(' boolean_expr ')'
;
switch_expr
: SWITCH '(' expr ')' '{' ( switch_body )* '}'
;
switch_body
: ( CASE constant | DEFAULT ) ':' expr_and_decl_list
;
binop_expr
: SUMOP product
| product ( SUMOP product )*
;
product
: atom ( MULOP atom )*
;
atom
: null_expr
| JAVA_ID
| JAVA_ID ABELIAN_INCREMENT_OP
| ABELIAN_INCREMENT_OP JAVA_ID
| constant
| '(' expr ')'
| array_expr '[' sexpr ']'
| array_expr '[' sexpr ']' ABELIAN_INCREMENT_OP
| ABELIAN_INCREMENT_OP array_expr '[' sexpr ']'
;
null_expr
: NULL
;
array_expr
: sexpr
;
boolean_expr
: boolean_product ( BOOLEAN_SUMOP boolean_product )*
;
boolean_product
: boolean_atom ( BOOLEAN_MULOP boolean_atom )*
;
boolean_atom
: BOOLEAN
| JAVA_ID
| '(' boolean_expr ')'
| LOGICAL_NOT boolean_expr
| relop_expr
;
constant
: STRING
| INTEGER
| FLOAT
| BOOLEAN
;
message
: <assoc=right> JAVA_ID '(' argument_list ')'
;
argument_list
: expr
| argument_list ',' expr
| /* null */
;
// Lexer rules
STRING : '"' ( ~'"' | '\\' '"' )* '"' ;
INTEGER : ('1' .. '9')+ ('0' .. '9')* | '0';
FLOAT : (('1' .. '9')* | '0') '.' ('0' .. '9')* ;
BOOLEAN : 'true' | 'false' ;
SWITCH : 'switch' ;
CASE : 'case' ;
DEFAULT : 'default' ;
BREAK : 'break' ;
CONTINUE : 'continue' ;
RETURN : 'return' ;
REQUIRES : 'requires' ;
NEW : 'new' ;
CLONE : 'clone' ;
NULL : 'null' ;
CONST : 'const' ;
RELATIONAL_OPERATOR : '!=' | '==' | '>' | '<' | '>=' | '<=';
LOGICAL_NOT : '!' ;
BOOLEAN_MULOP : '&&' ;
BOOLEAN_SUMOP : '||' | '^' ;
SUMOP : '+' | '-' ;
MULOP : '*' | '/' ;
ABELIAN_INCREMENT_OP : '++' | '--' ;
JAVA_ID: (('a' .. 'z') | ('A' .. 'Z')) (('a' .. 'z') | ('A' .. 'Z') | ('0' .. '9') | '_')* ;
INLINE_COMMENT: '//' ~[\r\n]* -> channel(HIDDEN) ;
C_COMMENT: '/*' .*? '*/' -> channel(HIDDEN) ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ -> channel(HIDDEN) ;
ASSIGN : '=' ;
Typical of the problem is that the parser can't recognise the unary minus in this expression (it simply does not accept the construct):
Base b1 = new Base(-threetwoone);