ANTLR Grammar for expressions

Question

I'm trying to implement a expression handling grammar (that deals with nested parenthesis and stuff). I have the following so far, but they can't deal with some cases (successful/failure cases appear after the following code block). Anyone know what's going on?

Note: The varname += and varname = stuff are just some additional AST generation helper stuff in XText. Don't worry about them for now.

...

NilExpression returns Expression:
  'nil';

FalseExpression returns Expression:
  'false';

TrueExpression returns Expression:
  'true';

NumberExpression returns Expression:
  value=Number;

StringExpression returns Expression:
  value=STRING; //EllipsesExpression: '...';
//FunctionExpression: function=function; //don't allow random functions


UnaryExpression:
  op=unop ('(' expr=Expression ')')|expr=Expression;

BinaryExpression:
  'or'? AndOp; //or op

AndOp:
  'and'? ComparisonOp;

ComparisonOp:
  ('>'|'<'|'>='|'<='|'=='|'~=')? ConcatOp;

ConcatOp:
  '..'? AddSubOp;

AddSubOp:
  ('+' '-')? MultDivOp;

MultDivOp:
  ('*' '/')? ExpOp;

ExpOp:
  '^'? (('(' expr=Expression ')')|expr=Expression);

ExprSideOne : Variable|NilExpression|FalseExpression|TrueExpression|
  NumberExpression|StringExpression|UnaryExpression;

Expression:
  ( 
   '('
  expression1=ExprSideOne expression2+=BinaryExpression*
   ')' 
  )
  |
  ( expression1=ExprSideOne expression2+=BinaryExpression* )
;
...

And here's the list of parses/fails:

c = ((b)); //fails
c = ((a not b)); //fails
c = b; //parses
d = (b); //parses

Need recursion. Just out of curiosity, why are your binary operator rules written as unary operators? Is nil >= .. + * (1) supposed to mean something? — Ellery Newcomer, Oct 10 '09 at 04:10

Malaxeur · Accepted Answer · 2009-09-21T03:53:22.663

4

What's going on is that your Expression/Expressions support single parentheses but not multiple parentheses (as you concluded). I don't have ANTLR specific experience but I've worked with Javacc which shares many similar concepts (I wrote a grammar for Prolog... don't ask).

To handle nested parentheses, you typically have something similar to:

ParenthesisExpression: '(' (ParenthesisExpression | Expression) ')';

This would mean that the expression is either wrapped in parentheses or it's just a raw expression. As for how the AST deals with this, a ParenthesisExpression 'is a' Expression, so it can be represented as a subclass or an implementation of (if Expression is an interface/abstract class of sorts).

edited Sep 21 '09 at 03:53

answered Sep 21 '09 at 03:29

Malaxeur

5,973
1
36
34

As an addition, you may have to deal with this case as well: ((a) not (b or (c))) so the fun is in ensuring that your Expression can also fall back to a ParenthesisExpression as well. – Malaxeur Sep 21 '09 at 03:36
Wouldn't that force a outer ()? Around (ParenthesisExpression | Expression) you have a pair of (...)s. – jameszhao00 Sep 21 '09 at 03:40
Sorry I meant the '(' ')' around your thing, not the ( ) :) – jameszhao00 Sep 21 '09 at 03:54
Yes, but only for 'ParenthesisExpression'. So consider a standard expression, it's either wrapped in parentheses or not, right? In the case that it is. The above rule will match. Then consider the case where it isn't, then the 'Expression' rule should catch. – Malaxeur Sep 21 '09 at 03:55
Example: (a) would get matched to 'ParenthesisExpression' once and then fall through to Expression. ((a)) would get matched to ParethesisExpression twice and then to Expression. 'a' would just get matched to Expression. – Malaxeur Sep 21 '09 at 03:56
Hmm that makes sense and works. Time to have fun with ((a) / b). – jameszhao00 Sep 21 '09 at 04:15
I'm noticing a problem though. As the parser goes from left->right, it's reading (, then (, ... (in ((a) + b)) and getting confused about what to do. Is there a special flag to set to alleviate this problem? – jameszhao00 Sep 21 '09 at 05:19
If there's something equivalent to 'lookahead' in ANTLR, that's the way to go. Lookahead is the concept if looking deeper into the expression and being able to bail if it gets confused. A high lookahead value will cause the parser to be much slower (javacc allows you to have customizable lookahead depending on the expression, I'm sure ANTLR does it too) but it will allow for overlapping subcases. – Malaxeur Sep 21 '09 at 18:08

score 1 · Answer 2 · answered Aug 13 '12 at 09:55

Make use of the ^ placed after the token/rule name is very useful for defining expressions.

    expression :    e1 (OR^ e1)* ;
    e1  :   e2 (AND^ e2)*;
    e2  :   e3 (PIPE^ e3)*;
    e3  :   e4 (ANDSYMB^ e4)*;
    e4  :   e5 ((EQUAL^|NOTEQUAL^) e5)*;
    e5  :   e6 ((LESS^|GREATER^) e6)*;
    e6  :   e7 ((PLUS^|MINUS^) e7)* ;
    e7  :   e8 ((STAR^|SLASH^) e8)* ;
    e8  :   e9 (NEW^ ID LPAREN RPAREN)*;
    e9  :   (NOT^)? e10;
    e10 :   e11 | call_def;
    e11 :   constant 
        | '(' expression ')' -> expression;

What does the caret ^ after the rule indicate? Update: i see another SOF seems to explain it http://stackoverflow.com/questions/11365781/caret-prefix-instead-of-postfix-in-antlr — WestCoastProjects, Jul 17 '15 at 20:44

Jason · Answer 3 · 2013-02-06T18:37:50.080

1

I've used this grammar for simple expressions: http://fisheye2.atlassian.com/browse/~raw,r=5175/antlr-examples/C/polydiff/Poly.g

I've also used the grammar contain in this project for more complex expressions: http://www.codeproject.com/KB/recipes/sota_expression_evaluator.aspx

edited Feb 06 '13 at 18:37

answered Nov 30 '09 at 21:52

Jason

2,341
17
14

The fisheye2 link is now dead. – lycono Sep 23 '16 at 18:44

ANTLR Grammar for expressions

3 Answers3

Linked