How Do I Build an Abstract Syntax Tree with JJTree?

Question

When building an AST and adding children to the tree, what is the difference between:

void NonTerminal #Nonterminal: { Token t;}
{
    t = <MULTIPLY> OtherNonTerminal() {jjtThis.value = t.image;} #Multiply
}

and:

void NonTerminal : { Token t;}
{
    t = <MULTIPLY> OtherNonTerminal() {jjtThis.value = t.image;} #Multiply(2)
}

Note:

<MULTIPLY : "*">

Are there any major differences and will both of these work the same way?

Also would another way of building the tree for this production rule:

void NonTerminal() : { Token t; }
{
    t = <MULTIPLY> OtherNonTerminal() { jjtThis.value = t.image; } #Mult(2)
|   t = <DIVIDE> OtherNonTerminal() { jjtThis.value = t.image; } #Div(2)
|   {}
}

be like this:

void NonTerminal() #Nonterminal(2) : { Token t; }
{
    (t = <MULTIPLY> OtherNonTerminal() | t = <DIVIDE> OtherNonTerminal() | {}) {jjtThis.value = t.image;}
}

Is the ‘NODE_DEFAULT_VOID` option set to `true`? – Theodore Norvell Nov 18 '18 at 00:06 — Theodore Norvell, Nov 18 '18 at 00:06
Yes its set to true. @TheodoreNorvell – webchatowner Nov 18 '18 at 11:47 — webchatowner, Nov 18 '18 at 11:47

sarath kumar · Answer 1 · 2018-11-19T08:08:50.110

Answer to this Question is yes there is a difference.

JAVACC or JJTREE grammar do compilation process in different steps.

Lexical Analysis, where the individual character are collected and try to frame a token with the Regex Provided in TOKEN, SPECIAL_TOKEN, MORE and SKIP sections. After every successful Lexical analysis a token will be generated.
Syntax analysis, where these tokens will be arranged in a tree called Syntax tree with terminal and non-terminal nodes with the production rules provided. Collecting each and every Token generated from lexical analysis, Syntax analysis tries to validate the Syntax from it.

NON-TERMINAL Node : Indicates other production rule.

TERMINAL Node : Indicates the token or data node.

And here is the difference,

After Successful Syntax verification we required a useful form to make use of it. More useful representation is the Tree Representation, we already have the Syntax tree Generated as a part of Syntax analysis which can be modified to get the useful tree out of it, this is where the JJTree come into picture to rename and create useful tree structure use #NODE_NAME syntax in production rules.

Edit for the Comment as below

Multiply(2) indicates only two Children this make sense if your operation is A*B, if you are performing A*B*C and with #Multiply(2) then the tree will be like

          Multiply
        /          \
  Multiply           C
    /  \
  A     B

if you are performing A*B*C and with #Multiply then the tree will be like

   Multiply    Multiply      Multiply
      |            |             | 
      A            B             C

Basically the difference between #Multiply and #Multiply(2) is Multiply(2) will wait for two tokens for the Node to be generated if found only one throws the exception and #Multiply will generate nodes as and when the Production rule got matched.

So what you are saying is that the AST is being built in a different way? But will they work the same way then, e.g. a multiply operation requires two children, so will #Multipy still work? What happens exactly happens if i don't add the two child nodes, e.g. #Multiply(2)? @sarathkumar — webchatowner, Nov 17 '18 at 12:16
#Multiply(2) indicates only two Children this make sense if your operation is A*B, if you are performing A*B*C and with #Multiply(2) then the tree will be like Multiply Multiply C A B if you are performing A*B*C and with #Multiply then the tree will be like Multiply Multiply Multiply A B C — sarath kumar, Nov 19 '18 at 08:02
Edited the Answer accordingly as above Thanks. @webchatowner — sarath kumar, Nov 19 '18 at 08:11
So basically, both ways of doing it are right, the AST is just being built differently? @sarathkumar — webchatowner, Nov 19 '18 at 09:56

Theodore Norvell · Accepted Answer · 2018-11-19T21:39:29.567

1

In the first case

void NonTerminal #Nonterminal: { Token t;}
{
    t = <MULTIPLY>
    OtherNonTerminal() {jjtThis.value = t.image;}
    #Multiply
}

the Multiply node will have as children all the nodes pushed on the stack during its node scope, excluding any that are popped before the end of the scope. In this case that means all the nodes pushed and not popped during the parsing of OtherNonTerminal.

In the second example

void NonTerminal #void : { Token t;}
{
    t = <MULTIPLY>
    OtherNonTerminal() {jjtThis.value = t.image;} 
    #Multiply(2)
}

the Multiply node will get the two top nodes from the stack as its children.

So probably there is a difference.

The other difference is that the second example doesn't specify a node associated with Nonterminal.

In the first case, this tree will be pushed

        Nonterminal
             |
          Multiply
              |
All nodes pushed (but not popped) during the parsing of OtherNonterminal

In the second case, the parsing of OtherNonterminal will do its thing (popping and pushing nodes), then two nodes will the popped and this tree will be pushed

     Multiply
      |     |
  A child  Another child

For the second question. The difference between

void NonTerminal() #void : { Token t; }
{
    t = <MULTIPLY>
    OtherNonTerminal()
    { jjtThis.value = t.image; }
    #Mult(2)
|
    t = <DIVIDE>
    OtherNonTerminal()
    { jjtThis.value = t.image; }
    #Div(2)
|
    {}
}

and

void NonTerminal() #Nonterminal(2) : {
    Token t; }
{
    ( t = <MULTIPLY> OtherNonTerminal()
    | t = <DIVIDE> OtherNonTerminal()
    | {}
    )
    {jjtThis.value = t.image;}
}

is that the first does not build a node when the empty sequence is matched.

Consider the second way in the the case where the next token is something other than * or /. You'll will get

      Nonterminal
      /        \
  Some node    Some other node
  don't want   you don't want

I'm actually surprised that the second one even gets past the Java compiler, since the reference to t is a potentially uninitialized variable.

edited Nov 19 '18 at 21:39

answered Nov 18 '18 at 00:27

Theodore Norvell

15,366
6
31
45

Is `#Multiply(2)` saying that the multiply operation requires two values(nodes), e.g. x * y and is `#Multiply` specifying that any number of nodes can be pushed onto the tree? – webchatowner Nov 18 '18 at 11:57
1

Yes, '#Multiply' mean the number of children is determined by the number of nodes pushed but not popped since the start of the node scope. – Theodore Norvell Nov 18 '18 at 13:29
Thanks very much! @TheodoreNorvell – webchatowner Nov 18 '18 at 13:54
So be to sure, both work in the same way, but the tree is just being built differently? @TheodoreNorvell – webchatowner Nov 18 '18 at 18:13
I have edited the question and added an example @TheodoreNorvell – webchatowner Nov 18 '18 at 18:22
1

I added to the answer. For future reference, once a question has been answered, it's probably not a good idea to substantively changed or add to the question. Better to ask a new question. – Theodore Norvell Nov 19 '18 at 21:35
1

"both work in the same way, but the tree is just being built differently?" You get different tree for the same input; so I wouldn't say that they "work the same way". – Theodore Norvell Nov 19 '18 at 21:38
Hey, thanks for your reply! I'm still confused a bit, because are you basically saying that for a token that is a binary operator, e.g. multiply, you have to use `#Multiply(2)` and not `#Multiply`? And do you push 2 children nodes because of ' and OtherNonTerminal in `t = OtherNonterminal() #Multiply(2)`? So for example if there was not `OtherNonTerminal()` after the multiply token would it be `t = #Multiply(1)`? – webchatowner Nov 19 '18 at 21:54
1

I'm not saying that you "have to" use one approach or the other. All I'm saying that the various approaches you've suggested produce different results. My advice is figure out what tree you want for each input and design your grammar and JJTree annotations accordingly. Definite nodes are very useful for building appropriate trees for left-associative binary operators.. See my answer at https://stackoverflow.com/questions/26846777/make-a-calculators-grammar-that-make-a-binary-tree-with-javacc for some advice on dealing with binary operators. – Theodore Norvell Nov 20 '18 at 12:59

How Do I Build an Abstract Syntax Tree with JJTree?

2 Answers2