7

I have a small custom scripting language, and I am trying to update it to allow boolean expressions such as a > 2 and a > 2 and (b < 3 or c > 5). It's the parenthetical expressions that I am having trouble with here.

Here is a (edited since the original post based on the answer from @Bart Kiers) full grammar that exhibits the problem. This is a pared-down version of my actual grammar, but the problem occurs here too.

grammar test;


options {
    language = 'JavaScript'; 
    output = AST;
} 


statement 
    :   value_assignment_statement  
        EOF
    ;


value_assignment_statement 
    :   IDENT
        '='
        expression                      
    ;

value_expression 
    :   value_list_expression           
    |   IDENT                           
    ;


value_list_expression 
    :   value_enumerated_list       
    ;


value_enumerated_list : '{' unary+ '}'
    ;



term 
    :   LPAREN expression RPAREN        
    |   INTEGER                         
    |   value_expression                
    ;

unary : ( '+' | '-' )* term
    ;

mult :  unary ( ('*' | '/') unary)*
    ;

expression : mult ( ('+' | '-') mult )*
    ;


boolean 
    :   boolean_expression
        EOF
    ;

boolean_expression
    :   boolean_or_expression
    ;

boolean_or_expression 
    :   boolean_and_expression (OR boolean_and_expression)*
    ;

boolean_and_expression 
    :   boolean_rel_expression (AND boolean_rel_expression)*
    ;

boolean_rel_expression
    :   boolean_neg_expression relational_operator boolean_neg_expression
    ;

boolean_neg_expression 
    :   (NOT)? atom
    ;

atom
    :   LPAREN boolean_expression RPAREN
    //| expression
    ;


relational_operator : '=' | '>' | '<';


LPAREN      :   '(';
RPAREN      :   ')';
AND         :   'and';
OR          :   'or';
NOT         :   'not';
IDENT       :   LETTER LETTER+;
INTEGER     :   DIGIT+;
WS          :   (' ' | '\n' | '\r' | '\t')+     { $channel = HIDDEN; };

fragment DIGIT      : '0'..'9';
fragment LETTER     : ('a'..'z' | 'A'..'Z');

My attempt to accommodate parenthetical boolean expressions such as a > 2 or (b < 3) is in the commented-out line in the atom rule. When I uncomment this line and include it in the grammar, ANTLR gives me this error:

[fatal] rule atom has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

I would like to address this by removing the recursion, but I can't seem to make the transition from the Wikipedia description on how to remove left recursion to my own stuff.

In using this grammar, I want sometimes to use statement as a root with input such as abc = 2 + 3, which assigns a value to a variable named abc. Other times I want to use the grammar to evaluate an expression with boolean as the root with input such as abc > 3 and (xyz < 5 or xyz > 10). When I tried to use @Bart's answer as a model, it worked fine until I tried to merge the parts of the grammar used by statement with the parts used by boolean. They should both be able to use an expression, but that's where I'm stuck with this left recursion error.

So, how can I both handle parentheses and avoid the left recursion problem?

Chris Farmer
  • 24,974
  • 34
  • 121
  • 164
  • Does your `expression` symbol have a rule for parenthesized expressions? – Fred Foo Jul 08 '11 at 19:34
  • @larsmans: It can represent parenthetical expressions that evaluate to a number, such as `(3 + x) * 2` but doesn't accommodate the boolean evaluations I am doing now. – Chris Farmer Jul 08 '11 at 19:55

1 Answers1

11

Boolean expressions are just the same as the additive- and multiplicative expression, and should therefore not be separated from them. Here's how to account for all types of expressions:

grammar test;

parse
  :  expression EOF
  ;

expression 
  :  or
  ;

or
  :  and (OR and)*
  ;

and
  :  rel (AND rel)*
  ;

rel
  :  add (('=' | '>' | '<') add)*
  ;

add
  :  mult (('+' | '-') mult)*
  ;

mult
  :  unary (('*' | '/') unary)*
  ;

unary 
  :  '-' term
  |  '+' term
  |  NOT term
  |  term
  ;

term 
  :  INTEGER  
  |  IDENT       
  |  list
  |  '(' expression ')'
  ;

list 
  :  '{' (expression (',' expression)*)? '}'
  ;

AND     :  'and';
OR      :  'or';
NOT     :  'not';
IDENT   :  LETTER LETTER*;
INTEGER :  DIGIT+;
WS      :  (' ' | '\n' | '\r' | '\t')+  { $channel = HIDDEN; };

fragment DIGIT   : '0'..'9';
fragment LETTER  : ('a'..'z' | 'A'..'Z');

which will parse the example input:

abc > 3 and (xyz < 5 or xyz > {1, 2, 3})

into the following parse tree:

enter image description here

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks for the reply. This makes sense to me, when used in isolation. I now see that the problem lies in how this boolean-related stuff interacts with the other expression-based stuff in my grammar. I have edited my original post with a subset of my real grammar that exhibits the problem. – Chris Farmer Jul 08 '11 at 21:50
  • @Chris, see my revised answer. – Bart Kiers Jul 10 '11 at 06:13
  • thanks so much for the update. The main reason I thought I wanted them separate is that in certain contexts I want the input to only be able to support some sort of relational comparison. In the grammar you have, there's only one root and `a + 3` is always going to be valid input, but sometimes I don't want it to be. So, in my use case, I would say that booleans are *not* the same as additive/multiplicative expressions as you suggest at the beginning of your answer. Is there some reason that it's dumb to think of them as I'm trying to? – Chris Farmer Jul 11 '11 at 15:26
  • @Chris, I'm not entirely sure what you're trying to do. But can't you just do it as I suggest (or very similar) and while evaluating (walking over the AST), perform certain checks to see if certain expressions are allowed at a specific location in the AST. If this isn't an option for you, please edit your original question (or post a new one) and give a couple of example input sources that show the problem. Good luck. – Bart Kiers Jul 12 '11 at 10:18