3

Taking an (almost) textbook example, where we expect multiplication to have precedence over addition, but also include an optional part to match.

expr : expr '*' expr  ('ALSO')?
     | expr '+' expr
     | INT
     ;

INT: [0-9]+;

WS : [ \t\r\n]+ -> skip ;

When trying out the grammar with 3 * 4 + 2 we get an unexpected tree that looks like

                  expr:1
               /    |   \
            expr:1  *   expr:2
              |         /  |  \
              3    expr:1  +  expr:1
                     |           |
                     4           2

However, when use 3 + 4 * 2 we get what I might expect

                  expr:1
               /    |   \
            expr:1  +   expr:2
              |         /  |  \
              3    expr:1  *  expr:1
                     |           |
                     4           2

Also, if you switch the optional token to the second line, we get the expected tree every time.

expr : expr '*' expr
     | expr '+' expr ('ALSO')?
     | INT
     ;

I also tried this using the non-greedy operator ??, and defining lexer tokens so we don't have to worry about oddities around ordering due to implicit tokens.

What would explain this ordering?

CodyT
  • 33
  • 3

1 Answers1

2

That looks like a bug. You can report it here: https://github.com/antlr/antlr4/issues (if it's not already reported... I did not check)

It seems a workaround would be to include an extra alternative that does not contain the ALSO token:

expr : expr '*' expr 'ALSO'
     | expr '*' expr
     | expr '+' expr
     | INT
     ;

which produces the expected parse trees for both 3 * 4 + 2 and 3 + 4 * 2.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Bart, I was crawling through the golden `parsing` badge holders - could you have a look [at my question with an open bounty](https://stackoverflow.com/questions/55405055/nodevisitor-class-for-peg-parser-in-python/55445187#55445187) ? – Jan Apr 01 '19 at 09:33