6

I am writing a grammar to handle scalar and vector expressions. The grammar below is simplified to show the problem I have where a scalar expression can be derived from a vector and a vector can be derived from a scalar. For example, a vector could be a literal [1, 2, 3] or the product of a scalar and a vector 2 * [1, 2, 3] (equivalent to [2, 4, 6]). A scalar could be a literal 2 or an index into a vector [1, 2, 3][1] (equivalent to 2).

grammar LeftRecursion;

Integer
    : [0-9]+
    ;

WhiteSpace
    : [ \t\r\n]+ -> skip
    ;

input
    : expression EOF;

expression
    : scalar
    | vector
    ;

scalar
    : Integer
    | vector '[' Integer ']'
    ;

vector
    : '[' Integer ',' Integer ',' Integer ']'
    | scalar '*' vector
    ;

ANTLR4 gives me the error: The following sets of rules are mutually left-recursive [scalar, vector]. This makes sense because scalar references vector and vice-versa, but at the same time it should be deterministic.

How would I refactor this grammar to avoid the mutual (indirect) left-recursion? I could expand one of the terms inplace, but that would introduce a lot of duplication in the full grammar where there are more alternatives for vector and scalar. I could also refactor the grammar to have a primary expression, but I don't want to allow scalar '*' scalar as a valid vector alternative. Are there other options?

Community
  • 1
  • 1
Rangi Keen
  • 935
  • 9
  • 29

2 Answers2

4

AFAIK, there is no way around it but to expand to eliminate the indirect recursive rule:

expression
    : scalar
    | vector
    ;

scalar
    : '[' Integer ',' Integer ',' Integer ']' '[' Integer ']'
    | scalar '*' vector '[' Integer ']'
    | Integer
    ;

vector
    : '[' Integer ',' Integer ',' Integer ']'
    | scalar '*' vector
    ;
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Couldn't you avoid the duplication by adding an extra rule? I.e. `vector_literal: '['Integer, Integer, Integer ']'` and use it in `scalar` and `vector`? `scalar: vector_literal '[' Integer ']'`, etc. – Milan Iliev Oct 09 '15 at 13:55
-1
scalar
    : Integer
    | vector '[' Integer ']'
    ;

vector
    : '[' Integer ',' Integer ',' Integer ']'
    | scalar '*' vector
    ;

gives that you could write an expressions

[i,i,i][i] * [i,i,i][i] * ... * [i,i,i]

which would render a stack overflow of the parser for java and other languages with limited stack-depth.

I think you should create a different grammatical rule for vector-lookups, it's not a scalar, it just results in a scalar, but this should be handled in the parser tree handling, not in ANTLR.

claj
  • 5,172
  • 2
  • 27
  • 30
  • 1
    Almost any grammar could result in unlimited expression depth. Take, for example, a self-referential rule `scalar : Integer | scalar '*' scalar;`. That could be expanded to any length as well `1 * 2 * 3 * 4 * ... * n`. – Rangi Keen Dec 27 '13 at 15:42