1

I'm trying to write a general purpose AST builder with a DFS approach.

The principle of the grammars is rather simple: An expression can have one or more options, each option consists of a list of nodes where each node can be either a terminal or an expression.

I have a state that works-ish but and am struggling with finding an exit condition for "endless loops".

If I define the grammar like this:

expr := addExpr;

addExpr := NUMBER
         | NUMBER OPERATOR expr
         ;

It passes through. The structure is semantically correct, but will produce wrong results when traversed. "4 - 2 + 2" gets evaluated as "4 - (2 + 2)" instead of the desired "(4 - 2) + 2".

I would need to define the grammar like this:

expr := addExpr;

addExpr := NUMBER
         | expr OPERATOR NUMBER
         ;

The problem here is, that it leads to an endless iteration:

expr -> addExpr -> expr -> addExpr -> ...

This is where I'm stuck. I cannot come up with an exit condition as to when to stop trying. I was thinking of having a history, if I compared the same expression against the remaining tokens, I can stop. But this stops too early and won't produce any result.

I guess the stack I need would kind of look like this:

expr (against 4 - 2 + 2)
  addExpr (against 4 - 2 + 2)
    0: NUMBER -> matches 4, won't finish.
    1: expr OPERATOR NUMBER
      expr (against 4 - 2 + 2) -> I cannot break yet.
        addExpr (against 4 - 2 + 2) -> I cannot break yet.
          0: NUMBER -> matches 4, won't finish.
          1: expr OPERATOR NUMBER
            expr (against 4 - 2 + 2) -> I cannot break yet.
              addExpr (against 4 - 2 + 2) -> I cannot break yet.
                0: NUMBER -> matches 4, will resolve in the end.
                1: expr OPERATOR NUMBER
                  expr (against 4 - 2 + 2) -> BREAK!

It seems kind of arbitrary when to break. Is there some good rule or condition to achieve this?

I don't really want to just set a "random" max-depth if I can avoid it. Maybe I should consider a BFS approach (whole different set of problems), but I'd like to stick with DFS if it's possible.

Cabadath
  • 897
  • 3
  • 13
  • 23
  • The key here is to iterate rather than recurse. See my answer on how to write recursive descent parsers: https://stackoverflow.com/a/2336769/120163 – Ira Baxter Mar 30 '19 at 16:17
  • Alternatively, build the tree bottom-up (that is, do a postorder traverse rather than a preorder traverse). – rici Mar 30 '19 at 19:38
  • Thank you @IraBaxter for your answer. I am aware of the link you posted. My problem with it is, that I'm not interested in writing a parser for a specific grammar. I wanted to build a generic grammar parser - something like ANTLR. But maybe that's the answer? I cannot just write a DSF search but need to find a way to generate the AST builder functions? – Cabadath Mar 31 '19 at 00:37
  • Interesting @rici. I'm not yet sure how to fit postorder into my algorithm, I'll certainly give it a try. – Cabadath Mar 31 '19 at 00:39
  • Then you should have a close look at MetaII, which generate recursive descent parsers from grammars. And you can implement this a few mind-blowing evenings. See https://stackoverflow.com/a/5739338/120163. Extentions of MetaII called "Tree Meta" (google that) will let you build grammars with integrated tree building, and then let you walk those trees and emit code. – Ira Baxter Mar 31 '19 at 18:21

0 Answers0