I'm developing a parser for learning purposes that follows some user-defined rules to create ASTs. During AST generation, however, I can't find a way to abstract the tree - all tokens/intermediate node types are maintaned in the final tree structure.
It's easier to explain with an example:
User-defined structures:
struct ASTExpr { };
struct ASTNumber : ASTExpr { int value; };
struct ASTAddition : ASTExpr { ASTExpr* rhs, lhs; };
User-defined rules:
[0..9] -> ASTNumber
ASTExpr [+] ASTExpr -> ASTAddition
ASTNumber -> ASTExpr
ASTAddition -> ASTExpr
Example input:
1 + 5
Example parsing process/output:
(ASTExpr)
|
(ASTAddition)
|
/----------------\
| | |
(ASTExpr) [+] (ASTExpr)
| |
(ASTNumber) (ASTNumber)
| |
[1] [5]
As you can see the tree is very complex and not very "abstract". The (...)
nodes are classes, the [...]
nodes are tokens.
This is what I would prefer:
(ASTAddition)
|
/----------------\
| |
(ASTNumber) (ASTNumber)
In the desired version of the tree, the tokens and the intermediate ASTExpr
nodes are not present.
How can I make my AST tree more "abstract"? Do I have to change the ruleset? What is usually done to ensure that the final AST contains less noise?