2

I'm creating a grammar right now and I had to get rid of left recursion, and it seems work for everything except the addition operator.

Here is the related part of my grammar:

SUBTRACT: '-';
PLUS: '+';
DIVIDE: '/';
MULTIPLY: '*';

expr: 
      (
        IDENTIFIER 
        | INTEGER 
        | STRING 
        | TRUE 
        | FALSE
      )
      (
        PLUS expr 
        | SUBTRACT expr 
        | MULTIPLY expr 
        | DIVIDE expr 
        | LESS_THAN expr 
        | LESS_THAN_OR_EQUAL expr 
        | EQUALS expr
      )*
      ;

INTEGER: ('0'..'9')*;
IDENTIFIER: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*;

Then when I try to do something like

x*1

It work's perfectly. However when I try to do something like

x+1

I get an error saying:

MismatchedTokenException: mismatched input '+' expecting '\u001C'

I've been at this for a while but don't get why it works with *, -, and /, but not +. I have the exact same code for all of them.

Edit: If I reorder it and put SUBTRACT above PLUS, the + symbol will now work but the - symbol won't. Why would antlr care about the order of stuff like that?

guy
  • 43
  • 3

2 Answers2

1

Avoiding left recursion (in an expression grammar) is usually done like this:

grammar Expr;

parse
  :  expr EOF
  ;

expr
  :  equalityExpr
  ;

equalityExpr
  :  relationalExpr (('==' | '!=') relationalExpr)*
  ;

relationalExpr
  :  additionExpr (('>=' | '<=' | '>' | '<') additionExpr)*
  ;

additionExpr
  :  multiplyExpr (('+'| '-') multiplyExpr)*
  ;

multiplyExpr
  :  atom (('*' | '/') atom)*
  ;

atom
  :  IDENTIFIER
  |  INTEGER
  |  STRING
  |  TRUE
  |  FALSE
  |  '(' expr ')'
  ;

// ... lexer rules ...

For example, the input A+B+C would be parsed as follows:

enter image description here

Also see this related answer: ANTLR: Is there a simple example?

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
0

I fixed it by making a new rule for the part at the end that I made from removing left recursion:

expr: 
      (
        IDENTIFIER 
        | INTEGER 
        | STRING 
        | TRUE 
        | FALSE
      ) lr*
      ;

lr:         PLUS expr 
        | SUBTRACT expr 
        | MULTIPLY expr 
        | DIVIDE expr 
        | LESS_THAN expr 
        | LESS_THAN_OR_EQUAL expr 
        | EQUALS expr;
guy
  • 43
  • 3
  • This is not the way to go: the expression `2*3+4` is being parsed as `(2*(3+4))` (i.e. the `+` now has a higher precedence than `*`). – Bart Kiers Mar 04 '11 at 08:22