7

I have two questions about how to write a recursive descent parser:

The first is what when you have a nonterminal that can match one of a few different nonterminals? How do you check which way is correct?

Second, how do you build an AST? Using YACC, I can just write a piece of code to execute for every instance of a nonterminal and it has special variables which refer to the "values" of the rules. How do you do a similar thing in a recursive descent parser?

mtk358
  • 565
  • 1
  • 7
  • 20

2 Answers2

6
  1. You try them in order, then backtrack on failure. Or you prove that your language is in LL(k) and look at most k symbols ahead.
  2. For each successful parse of a rule, you construct an object from the result of subrules.

E.g.,

class ASTNode {
  public:
    virtual int eval() = 0;
    virtual ~ASTNode() = 0;
};

// construct this when parsing an integer literal
class Value : ASTNode {
    int v;
  public:
    Value(int v_) : v(v_) {}
    virtual int eval() { return v; }
    virtual ~Value() {}
};

// construct this when parsing "x+y"
class Addition : ASTNode {
    ASTNode *left, *right;
  public:
    Addition(ASTNode *l, ASTNode *r) : left(l), right(r) {}
    virtual int eval() { return l->eval() + r->eval(); }
    virtual ~Addition() { delete left; delete right; }
};
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 2
    @Jörgen: messy, but possible. Updated with a note about LL and lookahead. (My background is in NLP, where every grammar is ambiguous, so backtracking is the first thing I learned about parsing :) – Fred Foo Mar 25 '11 at 19:04
  • shouldn't the ast nodes just contain the parsed information, without actively doing something with it? At least to me, embedding the actual logic (the actual addition in the sample) into the nodes themselves looks like a receipt for much problems later and seems to violate SRP. – stijn Mar 25 '11 at 19:25
  • @stijn: I'll admit that this is not particularly elegant, but I wanted to show a simple example that almost does something useful and still separates the logic more cleanly than the Yacc examples the OP refers to. – Fred Foo Mar 25 '11 at 19:30
  • @stijn: Where else should you put the logic? The nodes having an eval method seems like the perfect way. – mtk358 Mar 25 '11 at 20:18
  • @mtk: in a real interpreter, you'd separate the parsing and execution phases so you can do optimization in between. – Fred Foo Mar 25 '11 at 20:22
  • @mtk: One possibility is to implement the visitor pattern to walk the tree of nodes, and for each node create an object that will perform the logic. These 'logic nodes' will roughly have the same structure as the AstNodes, but as larsman says, there is now an intermediate stage that seperates the parsing from the logic, and can do optimizations etc. Using a visitor also has th benefit that if you want to print out the AST, you don't have to change the nodes but just implement a visitor that does printing only. – stijn Mar 26 '11 at 09:00
1

Or a leasson in how to give yourself a stroke in one easy leasson.

The first is what when you have a nonterminal that can match one of a few different nonterminals? How do you check which way is correct?

You need to look ahead in the stream and make a decision. Its hard to do backtracking on a RDC.

An easier solution is to design your grammar so that it does not need to look ahead (hard).

Second, how do you build an AST?

The return value from the function call is the tree for everything that was parsed by the call. You wrap all the sub calls into another dynamically allocated object and return that.

Community
  • 1
  • 1
Martin York
  • 257,169
  • 86
  • 333
  • 562