3

Due to the nature of top down parsing, ANTLR generates parse trees with some long repetitive structures with a lot of superfluous nodes before reaching a leaf inside expressions.

For example, using the C.g4 grammar (https://github.com/antlr/grammars-v4/tree/master/c) on the following code:

main(){
    int a=5, b=10;

    for(int i =0;i<b;i++){
        b=a--;
    }
}

The tree generated is:

(compilationUnit (translationUnit (externalDeclaration (functionDefinition (declarator (directDeclarator (directDeclarator main) ( ))) (compoundStatement { (blockItemList (blockItemList (blockItem (declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclaratorList (initDeclarator (declarator (directDeclarator a)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 5))))))))))))))))))) , (initDeclarator (declarator (directDeclarator b)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 10))))))))))))))))))) ;))) (blockItem (statement (iterationStatement for ( (declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclarator (declarator (directDeclarator i)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 0))))))))))))))))))) ;) (expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression i)))))))) < (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression b))))))))))))))))) ; (expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression i)) ++)))))))))))))))) ) (statement (compoundStatement { (blockItemList (blockItem (statement (expressionStatement (expression (assignmentExpression (unaryExpression (postfixExpression (primaryExpression b))) (assignmentOperator =) (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression a)) --))))))))))))))))) ;)))) })))))) })))) )

where the tree substructure that matches the code stub "int a=5" is:

(declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclaratorList (initDeclarator (declarator (directDeclarator a)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 5)))))))))))))))))))

We clearly see that this could approximately be reduced to:

(declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclaratorList (initDeclarator (declarator (directDeclarator a)) = (initializer (assignmentExpression (postfixExpression (primaryExpression 5))))))

I am using the parse trees to perform certain static analysis and due to the superfluous nodes shown above I would need to perform a lot of checking on the listener side of the system to visit the correct tree nodes of interest.

So I would like to know if there is a simple way I could modify the parse tree by using set of transformation rules to remove the superfluous nodes and/or reduce the long repetitive structures.

np20
  • 1,935
  • 3
  • 16
  • 24
  • AST conversion were available in Antlr3, and maybe will be in Antlr4. As of now, you have to do it manually.The benefit of doing it manually is that you can combine reduction with conversion and convert the text "123" into an integer "123" etc. The deeply nested structure also allows for more reuse and less repetition in the grammar itself. – Onur Mar 04 '15 at 12:12
  • 2
    What you're seeing is the [parse tree](http://en.wikipedia.org/wiki/Parse_tree), not the [AST](http://en.wikipedia.org/wiki/Abstract_syntax_tree) - don't confuse them! – Lucas Trzesniewski Mar 05 '15 at 23:51
  • Ok, the distinction is important. ANTLR4 generates parse trees and not ASTs. Thanks @LucasTrzesniewski. – np20 Mar 18 '15 at 08:19
  • 1
    This discusses a tool that uses CSTs and eliminate most of the "useless" unary production nodes in expressions, and other bloating CST behavior: http://stackoverflow.com/a/1916687/120163 One could arguably do this with ANTLR CSTs. – Ira Baxter Mar 18 '15 at 11:36
  • [Code to flatten ANTLR4 ParseTrees][1] [1]: http://stackoverflow.com/a/24789100/1494559 – np20 Mar 22 '15 at 19:34

0 Answers0