Antlr Tree Pattern Matching with Rewrite Rules

Question

I have a trivial antlr grammar for expressions like a.b.c + d.e.f:

grammar Test;
options {
    output=AST;
}
tokens {
    VARIABLE_ID;
    QUALIFIER_ID;
}

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
DOT : '.';
WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ;

variable_id  : id=ID   -> VARIABLE_ID[$id];
qualifier_id  : id=ID   -> QUALIFIER_ID[$id];

expr_start : expr EOF;
expr : var (options {greedy=true;} : '+' expr)*;

var : variable_id (DOT qualifier_id)*;

Now I want to define a pattern matcher over this grammar that turns a.b.c into 0.1.2, so I define a Tree Pattern Matcher as follows

tree grammar TestWalker;
options {
    tokenVocab=Test;
    ASTLabelType=CommonTree;
    filter=true;
    backtrack=true;
}

@members {
    TokenRewriteStream tokens;

    public void setTreeNodeStream(TreeNodeStream input) {
        super.setTreeNodeStream(input);
        tokens = (TokenRewriteStream)input.getTokenStream(); 
    }
}

topdown : var;

variable_id [int i] : id=VARIABLE_ID {
    tokens.replace($id.getToken(), "" + $i);
};

qualifier_id [int i] : id=QUALIFIER_ID {
    tokens.replace($id.getToken(), "" + $i);
};

var 
@init { int index = 0; }
: variable_id[index] 
(   DOT 
    { ++index; }
    qualifier_id[index]
)*;

Then I put together a small test program:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
    public static void main(String[] args) throws Exception {
        TestLexer lex = new TestLexer(new ANTLRInputStream(System.in));
        TokenStream tokens = new TokenRewriteStream(lex);

        TestParser parser = new TestParser(tokens);
        TestParser.expr_return expr = parser.expr();

        CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)expr.getTree());
        nodes.setTokenStream(tokens);
        TestWalker walker = new TestWalker(nodes);
        walker.downup(expr.getTree());
        System.out.println(tokens.toString());
    }
}

When I run this program with basic input, I see surprising results:
a.b.c -> 0.b.c
a.b + d.e -> 0.b + 0.e
and so on. It appears that the (DOT qualifier_id)* portion of my rule never matches and I cannot figure out why. I have tried adding my rules to the topdown and the bottomup portions of the Tree Pattern Match. If I switch from a filter matcher to a whole tree matcher and add rules to branch for the '+' case appropriately it works, but when the rewrite is just a smaller fragment of a much larger grammar this becomes untenable. Any pointers would be greatly appreciated.

Update: Using antlr 3.3

score 2 · Accepted Answer · answered Mar 08 '11 at 22:10

The crux of the issue (as identified by @Bart) is that the parse was not properly generating an AST. The grammar needs to build the AST (note the addtional ^ tokens).

grammar Test;
options {
    output=AST;
}
tokens {
    VARIABLE_ID;
    QUALIFIER_ID;
}

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
DOT : '.';
WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ;

variable_id  : id=ID   -> VARIABLE_ID[$id];
qualifier_id  : id=ID   -> QUALIFIER_ID[$id];

expr_start : expr EOF;
expr : var (options {greedy=true;} : '+'^ expr)*;

var : variable_id (DOT^ qualifier_id)*;

Then the tree pattern matcher needs to walk the AST as it is constructed. Note the structure of the expr rule and the arguments used to handle state.

tree grammar TestWalker;
options {
    tokenVocab=Test;
    ASTLabelType=CommonTree;
    filter=true;
    backtrack=true;
}

@members {
    TokenRewriteStream tokens;

    public void setTreeNodeStream(TreeNodeStream input) {
        super.setTreeNodeStream(input);
        tokens = (TokenRewriteStream)input.getTokenStream(); 
    }
}

topdown : expr[0];

variable_id returns [int r] : id=VARIABLE_ID {
    $r = 0;
    tokens.replace($id.getToken(), "" + $r);
};

qualifier_id [int i] returns [int r] : id=QUALIFIER_ID {
    $r = $i + 1;
    tokens.replace($id.getToken(), "" + $r);
};

expr [int i] returns [int r]
    : v=variable_id { $r = $v.r; }
    | ^(DOT e=expr[$i] q=qualifier_id[$e.r] { $r = $q.r; } )
    ;

Now the output will run as expected:
a.b + d.e + c.d.e.f
0.1 + 0.1 + 0.1.2.3
and the produced AST looks correct.

output AST

Great! Thanks for posting the solution +1. – Bart Kiers Mar 08 '11 at 22:33 — Bart Kiers, Mar 08 '11 at 22:33

score 1 · Answer 2 · answered Mar 08 '11 at 20:30

I am not familiar with this (relatively new) pattern-tree walking. I glanced over it on the Wiki, but it's not in my copy of the ANTLR reference.

There is something not 100% with your Test grammar though: when I generate a parser from it, I get:

java -cp antlr-3.2.jar org.antlr.Tool Test.g
warning(200): Test.g:18:46: Decision can match input such as "'+'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

and when I visualize the AST your parser produces using your class (with a small addition):

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
    public static void main(String[] args) throws Exception {
        TestLexer lex = new TestLexer(new ANTLRStringStream("a.b + d.e"));
        TokenStream tokens = new TokenRewriteStream(lex);

        TestParser parser = new TestParser(tokens);
        TestParser.expr_return expr = parser.expr();

        CommonTreeNodeStream nodes = new CommonTreeNodeStream((Tree)expr.getTree());
        nodes.setTokenStream(tokens);

        CommonTree tree = (CommonTree)expr.getTree();

        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);

        TestWalker walker = new TestWalker(nodes);
        walker.downup(tree);
        System.out.println(tokens.toString());
    }
}

I see that the input a.b + d.e produces the AST:

enter image description here

I imagine your tree walker is iterating over said tree, which doesn't surprise me it doesn't produce the desired result since it's just a flat list of nodes with a single root.

Interesting, that error message does not show up in antlr3.3 (which is what I am using) — Matt, Mar 08 '11 at 20:58
@Matt, well, it's not a serious warning: the lexer & parser _are_ generated after all. What _is_ serious (AFAIK) is the fact that a flat AST is generated. At least, I think your tree-pattern-grammar does not like it... :) — Bart Kiers, Mar 08 '11 at 21:01
I see the same thing using DOTTreeGenerator, but when I use antlr works I see a full parse tree. — Matt, Mar 08 '11 at 21:17
@Matt, ANTLRWorks creates a parse tree on its own (which is not available to you). To create a proper AST, see this [previous Q&A](http://stackoverflow.com/questions/4931346/how-to-output-the-ast-built-using-antlr). — Bart Kiers, Mar 08 '11 at 21:30
@Bart Thanks, let me chug on this for a while and get back to you. — Matt, Mar 08 '11 at 21:34

Antlr Tree Pattern Matching with Rewrite Rules

2 Answers2