2

I have built a recursive decent parser based on a grammar. Currently my parser only tells if the input sequence of Tokens is accepted by the grammar. I want to return if the grammar accepts the input and an Abstract Syntax Tree. I'm not sure how to do that.

What I have so far is a function corresponding to each production rule in the grammar. I have transformed the grammar so that a Terminal is always the first element of each production rule. Below is a subset of the grammar I am trying to build a syntax tree for.

program -> VAR = exp
exp -> NUM term
exp -> LEFTPAR exp RIGHTPAR
term -> MUL NUM term
term -> DIV NUM term
term -> . (empty)

An example of a function for a rule would be:

public Pair<bool, Token> exp(Token tok)
{
    if (tok.type == NUM)
    {
         return term(tok.next);
    }
    if (tok.type = LEFTPAR)
    {
         Pair<bool, Token> temp = exp(tok.next);
         if (temp.left && temp.right.type == RIGHTPAR)
             return new Pair<bool, Token>(true,temp.right.next);
         return new Pair<bool, Token>(false,null);
    }
}

What would a strategy be to turn functions like this into a syntax tree builder? I tried to pass a tree node as input to all the functions but when there are rules that have multiple non-terminals it gets a little more confusing. It seems like it would be easier to build a parse tree instead, then convert it to an AST afterwords. Any help is appreciated!

McAngus
  • 1,826
  • 18
  • 34
  • 1
    See my answer on how to build a recursive descent parser, which discusses how to build an AST. http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 (You don't have to make each rule start with a terminal; iteration can handle many "left-recursion" cases.) – Ira Baxter Feb 22 '16 at 05:05
  • Thanks! That helps a lot! What do you do when you have nullable rule like `term` in my example, or multiple (like in my full grammar)? Seems that the only solution is to have a bunch of if statements to check which ones are null and then don't return them. – McAngus Feb 22 '16 at 05:18
  • 1
    Since you control what node each rule returns, you can pass anything you like up the tree. Yes, you might need complex logic to decide what to actually build at each rule. That's the price of having an AST that doesn't have the exact same shape as the grammar (e.g., a CST). For small grammars, you might be willing to do that extra coding. For a large or quickly evolving grammar, building a tree that's the same shape as the grammar is brainless and easy. You don't get a "real" AST, but if you follow my suggestions about eliminating some nodes as you go, what you get is close enough. – Ira Baxter Feb 22 '16 at 05:22
  • As i have found out that recursive descent parsing is suitable to be implemented through procedural-oriented languages but doesn't exactly suit for Object oriented Programming. – Anil Kumar Feb 23 '16 at 09:07

1 Answers1

1

Here's an example of parsing an argument to a function. This is in C but the idea transfers ie you build the AST as you parse the stream of tokens.

This function parses strings like foo : double

static void parse_arg(parser_obj *obj, AstFunc *func) {
  Token   * tok;
  TokenId   tid = peek(obj);

  if(tid == T_PAREN_R) {
    return;
  }
  EXPECT(T_ID);
  tok = t(obj);
  char *arg_name = tok_value(tok);
  EXPECT_EAT(T_COLON);

  tok = t(obj);
  ctype arg_type = tokid_to_type(tok_id(tok));
  func->ops->new_arg(func, arg_name, arg_type);
}

The func object is effectively a Node in the AST and in this case for multiple arguments when we add a new argument it would be added to a list or to a tree or to whatever data structure you want to use once you're done parsing.

In the following line we're adding an arg to an the object func.

func->ops->new_arg(func, arg_name, arg_type);

The actual internals of func ie the shape of the tree being built or how it's implemented is not visible to the parser.

Harry
  • 11,298
  • 1
  • 29
  • 43