I'm trying to write a general purpose AST builder with a DFS approach.
The principle of the grammars is rather simple: An expression can have one or more options, each option consists of a list of nodes where each node can be either a terminal or an expression.
I have a state that works-ish but and am struggling with finding an exit condition for "endless loops".
If I define the grammar like this:
expr := addExpr;
addExpr := NUMBER
| NUMBER OPERATOR expr
;
It passes through. The structure is semantically correct, but will produce wrong results when traversed. "4 - 2 + 2" gets evaluated as "4 - (2 + 2)" instead of the desired "(4 - 2) + 2".
I would need to define the grammar like this:
expr := addExpr;
addExpr := NUMBER
| expr OPERATOR NUMBER
;
The problem here is, that it leads to an endless iteration:
expr -> addExpr -> expr -> addExpr -> ...
This is where I'm stuck. I cannot come up with an exit condition as to when to stop trying. I was thinking of having a history, if I compared the same expression against the remaining tokens, I can stop. But this stops too early and won't produce any result.
I guess the stack I need would kind of look like this:
expr (against 4 - 2 + 2)
addExpr (against 4 - 2 + 2)
0: NUMBER -> matches 4, won't finish.
1: expr OPERATOR NUMBER
expr (against 4 - 2 + 2) -> I cannot break yet.
addExpr (against 4 - 2 + 2) -> I cannot break yet.
0: NUMBER -> matches 4, won't finish.
1: expr OPERATOR NUMBER
expr (against 4 - 2 + 2) -> I cannot break yet.
addExpr (against 4 - 2 + 2) -> I cannot break yet.
0: NUMBER -> matches 4, will resolve in the end.
1: expr OPERATOR NUMBER
expr (against 4 - 2 + 2) -> BREAK!
It seems kind of arbitrary when to break. Is there some good rule or condition to achieve this?
I don't really want to just set a "random" max-depth if I can avoid it. Maybe I should consider a BFS approach (whole different set of problems), but I'd like to stick with DFS if it's possible.