How to back reference AST in custom rewrite action?

Question

I already know the workaround for this problem, but I would like to really use this one approach, for at least one reason -- it should work.

This is rule taken from "The Definitive ANTLR Reference" by Terence Parr (the books is for ANTLR3):

expr : (INT -> INT) ('+' i=INT -> ^('+' $expr $i) )*;

If INT is not followed by + the result will be INT (single node), if it is -- subtree will be built with first INT (referred as $expr) as left branch.

I would like to build similar rule, yet with custom action:

mult_expr : (pow_expr -> pow_expr ) 
            (op=MUL exr=pow_expr 
              -> { new BinExpr($op,$mult_expr.tree,$exr.tree) })*;

ANTLR accepts such rule, but when I run my parser with input (for example) "5 * 3" it gives me an error "line 1:1 missing EOF at '*'5".

QUESTION: how to use back reference with custom rewrite action?

@Bart Kiers, not to use custom actions in parser grammar, but rely on default AST. Then write tree grammar, and rewrite entire AST to custom one. I would like **very much** to avoid this, because it doubles my work, and more I see such workarounds the more I doubt in power of ANTLR (it supposed to save me work ;-D). — greenoldman, Aug 10 '12 at 19:45
Not sure if that is possible... However, would creating your own `CommonTreeAdaptor` be an option? (see: http://stackoverflow.com/questions/7635729/extend-antlr3-asts) — Bart Kiers, Aug 10 '12 at 21:01
@Bart Kiers, I don't see how adaptor changes things here (btw. when you introduce custom AST you have to introduce custom adaptor as well, and I did it). — greenoldman, Aug 10 '12 at 21:13
you have your own adaptor already? Then why aren't you creating instances of your own node-classes in its `create(Token)` method and let your rule just be: `mult_expr : pow_expr (MUL^ pow_expr)*`? — Bart Kiers, Aug 10 '12 at 21:17
@Bart Kiers, you mean I should just mark what is the root is (and what to ignore) in grammar, and all logic (what kind of subtree to create) put into adaptor and rely on root token? How would you tell the difference between binary and unary `-`? — greenoldman, Aug 11 '12 at 09:28
Yes, that is what I meant. In case of unary `-`, simply put `U_SUB` in your `tokens { ... }` block and do: `unary_expr : SUB atom -> ^(U_SUB atom) | atom;`. Then any occurrence of `SUB` inside the `create(...)` method of your adaptor will be the binary `-` and `U_SUB` the unary `-`. — Bart Kiers, Aug 11 '12 at 11:53
@Bart Kiers, I will keep this question open -- you know, maybe somebody would add solution to this approach -- but anyway, could you repost your comments as answer. I could accept them then. Thank you in avance (and thanks for clarification). — greenoldman, Aug 11 '12 at 19:20
sorry, didn't see your last comment. Sure, perhaps there's another way someone knows how to solve it (I'd be interested in knowing it as well, although I doubt there is... :)). And you're welcome of course! — Bart Kiers, Aug 20 '12 at 10:14

score 1 · Answer 1 · edited May 23 '17 at 10:24

I'd recommend creating your own CommonTreeAdaptor and move the creation ow custom nodes to this CommonTreeAdaptor instead of doing this in your grammar file. More information on that, see: Extend ANTLR3 AST's

In case of operators that could have multiple meanings, like the minus sign (binary or unary operator), let your parser rule rewrite the unary operator like this:

grammar X;

...

tokens { U_SUB; } 

add_expr
 : mult_expr ((SUB | ADD)^ mult_expr)*
 ;

...

unary_expr
 : SUB atom -> ^(U_SUB atom)
 | atom
 ;

...

And then in your implementation of your CommonTreeAdaptor, do something like this:

@Override
public Object create(Token t) {
  ...
  switch(t.getType()) {
    case X.SUB   : /* return a binary-tree */
    ...
    case X.U_SUB : /* return an unary-tree */
  }
  ...
}

I hope you won't be angry with me for switching "solution" mark, however I finally found the answer to my question. A direct one. — greenoldman, Aug 22 '12 at 16:14

greenoldman · Accepted Answer · 2012-08-22T16:28:17.600

I am persistent guy, and this idea of using my custom nodes in one step was bothering me... ;-)

So, I did. The crucial points are:

putting EOF! at the end of the "main" rule,
when labeling the tokens, putting labels next to token, not to group, so (op='*'|op='/'), not op=('*'|'/')

I don't know for sure if this approach of using grammar rules to create immediately custom nodes will be a good a idea, but since this solves the problem asked in question I am marking this as solution.

And for the record, the most interesting rule looks now like this:

mult_expr : (exl=pow_expr -> $exl ) 
        ((op=MUL|op=IDIV|op=RDIV|op=MOD) exr=pow_expr 
        -> { new BinaryExpression($op,$exl.tree,$exr.tree) })*;

No, of course not! :) Thanks for posting the solution. Although I won't be using your solution (I think), it *is* good to know it is possible. — Bart Kiers, Aug 22 '12 at 16:46

How to back reference AST in custom rewrite action?

2 Answers2