ANTLR4 parse tree simplification

Question

Is there any means to get ANTLR4 to automatically remove redundant nodes in generated parse trees?

More specifically, I've been experimenting with a grammar for GLSL and you end up with long linear sequences of "expressions" in the parse tree due to the rule forwarding needed to give the automatic handling of operator precedence.

Most of the generated tree nodes are simply "forward to the next level of precedence", so don't provide any useful syntactic information - you only really need the last expression node in each sequence (i.e. the point at which the rule forwarding stopped), or the point where it becomes an actual tree node with more than one child (i.e. an actual expression was encountered in the source) ...

I was hoping there would be an easy way to kill off the dummy intermediate expression nodes - this type of structure must be common in any grammar with operator precedence.

The basic structure of the grammar is a fairly direct clone taken from the Khronos specification for the language:

https://www.khronos.org/registry/gles/specs/3.1/es_spec_3.1.pdf

Lucas Trzesniewski · Accepted Answer · 2015-03-03T23:26:17.473

6

ANTLR v4 is able to generate code from a single recursive rule dealing with different precedence levels, if you use a grammar like this (example for basic math):

expr : '(' expr ')'
     | '-' expr
     | expr ('*'|'/') expr
     | expr ('+'|'-') expr
     | INT
     ;

ANTLR v3 was unable to do so and basically required you to write one rule per precedence level. So I'd advise you to rewrite your grammar to avoid these boilerplate rules.

Then, I think you're confusing the parse tree (aka concrete syntax tree) with the AST (abstract syntax tree). The AST is like a simplified version of the parse tree, which keeps only what's needed for your purpose. For instance, with the expr rule above, the AST wouldn't contain any node for parentheses, since the precedence is encoded in the tree itself and you usually don't need to know whether a part of a given expression was parenthesized or not.

Your program should build an AST from the parse tree and then go from there. Don't deal with parse trees directly, even if it seems convenient at first sight because the tool generates them for you. It'll quickly become cumbersome. Build your own tree structure (AST), tailored for the task at hand.

edited Mar 03 '15 at 23:26

answered Mar 03 '15 at 23:20

Lucas Trzesniewski

50,214
11
107
158

I'm writing a source-to-source translator, i.e. aiming to remove unused variables, unused functions, etc. Where possible I want to keep source formatting, so would rather not go too abstract, but yeah I agree with the sentiment. – solidpixel Mar 03 '15 at 23:29
P.S. Thanks for the update on the single recursive rule - I think that will solve the problem nicely. (Still trying to wrap my head around the changes in ANTLR4 - quite different to 3 ...). – solidpixel Mar 03 '15 at 23:32
@Isogen74 yeah, it's *very* different from v3, and that's for the better IMHO. Anyway, if you're building a source-to-source translator you're totally right in *not* being too abstract. You *will* need the parse tree for such things. – Lucas Trzesniewski Mar 03 '15 at 23:34
Here's an idea for you (that I've used successfully): build an AST, and in each AST node keep a reference to the related parse tree node. This way you get the best from both. You can have an abstract syntax tree while still being able to relate the nodes to the individual characters in the input. – Lucas Trzesniewski Mar 03 '15 at 23:36
Thanks for the assist - worked a charm as a technique. – solidpixel Mar 07 '15 at 01:11
How do you generate an AST for antlr4 in Java? I could not find any documentation for the same. – Rohit Banga Dec 20 '18 at 03:40
@RohitBanga in ANTLR4 you don't generate an AST automatically anymore as you would do in ANTLR3. You build it manually from the CST. See [my answer here](https://stackoverflow.com/a/29996191/3764814) for an example (it's in C#, but that translates to Java just fine). – Lucas Trzesniewski Dec 20 '18 at 09:11
Ok thanks. That does mean we need to maintain both the grammar and the AST + AST visitor as the grammar evolves. I guess it's doable. Is there a reason for not auto generating the AST as there seems to be an almost one to one mapping between grammar and AST? I guess some hints would be required to skip some of the tokens or rules like parentheses in the expr example. – Rohit Banga Dec 20 '18 at 16:03
@RohitBanga You'd have to ask the tool author, but my guess would be that manually created ASTs provide more flexibility: you have full control. It's certainly explained in the ANTLR book, but it's been a while since I've read it. That said, if your AST maps one-to-one to the CST, then maybe you don't need an AST in the first place :) – Lucas Trzesniewski Dec 20 '18 at 16:28

score 2 · Answer 2 · answered Mar 03 '15 at 23:21

Use the Visitor implementation to access each node in sequence. Build your own tree by adding nodes to parents as they are visited. Decide at the time the node is visited whether to add it to your new tree or not. For example:

public T visitExpression(@NotNull AcParser.ExpressionContext ctx) {
        // Expressionable parent = getParent(Expressionable.class, ctx);
        // Class<? extends AcExpression> expClass = AcExpression.class;
        AcExpression obj = null;
        String text = ctx.getText();

        //do something with text or children
        for (int i=0; i<ctx.getChildCount(); i++){
            printnl(ctx.getChild(i).getText()+"/");
        }

        return visitChildren(ctx);
    }

Thanks - combined with the advice from Lucas (don't be lazy and do a proper AST), this looks like the way to go. — solidpixel, Mar 03 '15 at 23:34

ANTLR4 parse tree simplification

2 Answers2

Linked