4

Let me put the question first: Can I convert a parse tree implementing this particular grammar to an AST trivially.

I was given this grammar to build a parse tree:

literal := INTEGER | FLOAT | TRUE | FALSE .

designator := IDENTIFIER { "[" expression0 "]" } .

op0 := ">=" | "<=" | "!=" | "==" | ">" | "<" .
op1 := "+" | "-" | "or" .
op2 := "*" | "/" | "and" .

expression0 := expression1 [ op0 expression1 ] .
expression1 := expression2 { op1  expression2 } .
expression2 := expression3 { op2 expression3 } .
expression3 := "not" expression3
       | "(" expression0 ")"
       | designator
       | call-expression
       | literal .

For this particular example:

func main() : void {
    let a = 1 + 2 + 3 + 4;
}

My parser will generate (part of) the parse tree as such

            EXPRESSION1
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(1)(lineNum:2, charPos:10)
                OP1
                  ADD(lineNum:2, charPos:12)
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(2)(lineNum:2, charPos:14)
                OP1
                  ADD(lineNum:2, charPos:16)
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(3)(lineNum:2, charPos:18)
                OP1
                  ADD(lineNum:2, charPos:20)
                EXPRESSION2
                  EXPRESSION3
                    LITERAL
                      INTEGER(4)(lineNum:2, charPos:22)

Just notice how these tree branches under EXPRESSION1 go:

EXPRESSION2 + EXPRESSION2 + EXPRESSION2 + EXPRESSION2

which the operator + doesn't correspond to its two operands. So it seems to me that, in the AST conversion, I can't get an AST that aids 3-address IR code generation by simply pulling up the operator to replace the non-terminal EXPRESSION1.

To achieve this goal, the grammar I would have written for this language will be like this instead

expression1 := expression2 | expression1 + expression2  (1)
expression2 := expression3 | expression2 * expression3  (2)
expression3 := literal                                  (3)

which the branches under EXPRESSION1 are only

EXPRESSION1 + EXPRESSION2

However, this grammar is not LL(1) since |FIRST(expression2)| = |{literal, +}| > 1.

It begs the question that (1) what would be the most elegant and trivial way to convert this parse tree? (2) is my construction of the parse tree a complete waste of time for this grammar that I should have started out coding an AST instead?

Davis
  • 110
  • 2
  • 8
  • You might consider the real differences between a parse tree and an abstract syntax tree. See my answer: http://stackoverflow.com/questions/1888854/what-is-the-difference-between-an-abstract-syntax-tree-and-a-concrete-syntax-tre/1916687#1916687 – Ira Baxter Feb 23 '17 at 21:07
  • That seems very nice. I have heard some nice properties about GLR. Now, there is compressed CST to aid. Thanks for the information. – Davis Feb 24 '17 at 03:53
  • Well, this is some school assignment that I wasn't able to figure out before the deadline. There is some specific guideline I have to follow. I was too stupid not to realize there was problem with the expression grammar that the target AST wasn't just a "simplified" version of the CST. The tree structure is different. I am just getting a little pissed off. – Davis Feb 24 '17 at 04:04
  • 1
    That's the trouble with ASTs: they aren't isomorphic to CSTs. This means you have to define a mapping from the CST produced by the parser to the AST that somebody thinks is nice, and then you have to implement that mapping by walking over the CST and/or building AST subtrees as you do reductions. That map, being defined by "nice" is simply ad hoc and extra work that you have to do. With big grammars, this is a royal PIA. My sympathies. Been there, done that. – Ira Baxter Feb 24 '17 at 05:04

1 Answers1

2

I believe you wish to produce an AST like:

     ADD
     /  \
    1   ADD
        /  \  
       2   ADD
           / \
          3   4

But probably you noticed that your parse tree is actually a flat list not representing an easy starting point to group operators and their operands in single pass. Anyway coding a more advanced parser, recognizing tree structure and applying grammar reductions is not a trivial task.

Hence before starting doing that you might wish to consider some existing parser generators like yacc or ANTLR. Probably you will need to rewrite your grammar to create rules centered on the operators instead of treating them as a recursion utility. A grammar not being a classic LL(1) might not be a big problem as modern generators (like ANTLR) can handle LL(k) grammars with bigger lookahead, predicates etc. and just bypass issues of that type.

If you still insist on coding it manually think about using a stack to store symbols and converting them to an AST node once an expression is collected but again it is not a simple task.

jszpilewski
  • 1,632
  • 1
  • 21
  • 19
  • Does this mean sticking with LL(1) grammar is in its nature a bad design because it handles operator precedence but fails at operator associativity? – Davis Feb 23 '17 at 18:36
  • @Davis Targeting LL(1) is fine but quite often not practical and not absolutely necessary when you can use an advanced LL(k) parser generator like ANTLR. – jszpilewski Feb 23 '17 at 18:45