Due to the nature of top down parsing, ANTLR generates parse trees with some long repetitive structures with a lot of superfluous nodes before reaching a leaf inside expressions.
For example, using the C.g4 grammar (https://github.com/antlr/grammars-v4/tree/master/c) on the following code:
main(){
int a=5, b=10;
for(int i =0;i<b;i++){
b=a--;
}
}
The tree generated is:
(compilationUnit (translationUnit (externalDeclaration (functionDefinition (declarator (directDeclarator (directDeclarator main) ( ))) (compoundStatement { (blockItemList (blockItemList (blockItem (declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclaratorList (initDeclarator (declarator (directDeclarator a)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 5))))))))))))))))))) , (initDeclarator (declarator (directDeclarator b)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 10))))))))))))))))))) ;))) (blockItem (statement (iterationStatement for ( (declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclarator (declarator (directDeclarator i)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 0))))))))))))))))))) ;) (expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression i)))))))) < (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression b))))))))))))))))) ; (expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression i)) ++)))))))))))))))) ) (statement (compoundStatement { (blockItemList (blockItem (statement (expressionStatement (expression (assignmentExpression (unaryExpression (postfixExpression (primaryExpression b))) (assignmentOperator =) (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression a)) --))))))))))))))))) ;)))) })))))) })))) )
where the tree substructure that matches the code stub "int a=5" is:
(declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclaratorList (initDeclarator (declarator (directDeclarator a)) = (initializer (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression 5)))))))))))))))))))
We clearly see that this could approximately be reduced to:
(declaration (declarationSpecifiers (declarationSpecifier (typeSpecifier int))) (initDeclaratorList (initDeclaratorList (initDeclarator (declarator (directDeclarator a)) = (initializer (assignmentExpression (postfixExpression (primaryExpression 5))))))
I am using the parse trees to perform certain static analysis and due to the superfluous nodes shown above I would need to perform a lot of checking on the listener side of the system to visit the correct tree nodes of interest.
So I would like to know if there is a simple way I could modify the parse tree by using set of transformation rules to remove the superfluous nodes and/or reduce the long repetitive structures.