Recursive descent parsing - from LL(1) up

Question

The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):

<expr>      :=  <term> + <term>
            |   <term> - <term>
            |   <term>
<term>      :=  <factor> * <factor>
                <factor> / <factor>
                <factor>
<factor>    :=  <number>
            |   <id>
            |   ( <expr> )
<number>    :=  \d+
<id>        :=  [a-zA-Z_]\w+

Because it is always enough to see the next token in order to know the rule to pick. However, suppose that I add the following rule:

<command>   :=  <expr>
            |   <id> = <expr>

For the purpose of interacting with the calculator on the command line, with variables, like this:

calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49

Is it true that I can not use a simple LL(1) predictive parser to parse <command> rules ? I tried to write the parser for it, but it seems that I need to know more tokens forward. Is the solution to use backtracking, or can I just implement LL(2) and always look two tokens forward ?

How to RD parser generators handle this problem (ANTLR, for instance)?

The grammar is LL(2), but that doesn't mean that you *always* have to look two tokens ahead. You normally look only one token ahead, and two only where required (like in the case where you have to choose between and ). ANTLR for example figures the required lookahead out itself for each rule of the grammar. — stmax, May 23 '12 at 20:58

Remo.D · Accepted Answer · 2008-09-24T18:03:12.650

THe problem with

<command>   :=  <expr>
            |   <id> = <expr>

is that when you "see" <id> you can't tell if it's the beginning of an assignement (second rule) or it's a "<factor>". You will only know when you'll read the next token.

AFAIK ANTLR is LL(*) (and is also able to generate rat-pack parsers if I'm not mistaken) so it will probably handle this grammare considering two tokens at once.

If you can play with the grammar I would suggest to either add a keyword for the assignment (e.g. let x = 8) :

<command>   :=  <expr>
            |   "let" <id> "=" <expr>

or use the = to signify evaluation:

<command>   :=  "=" <expr>
            |   <id> "=" <expr>

score 5 · Answer 2 · answered Oct 13 '10 at 11:06

I think there are two ways to solve this with a recursive descent parser: either by using (more) lookahead or by backtracking.

Lookahead

command() {
    if (currentToken() == id && lookaheadToken() == '=') {
        return assignment();
    } else {
        return expr();
    }
}

Backtracking

command() {
    savedLocation = scanLocation();
    if (accept( id )) {
         identifier = acceptedTokenValue();
         if (!accept( '=' )) {
             setScanLocation( savedLocation );
             return expr();
         }
         return new assignment( identifier, expr() );
    } else {
         return expr();
    }
}

score 2 · Answer 3 · edited Oct 23 '08 at 06:56

The problem is that the grammar:


<command>   :=  <expr>
            |   <id> = <expr>

is not a mutually-recursive procedure. For a recursive decent parser you will need to determine a non-recursive equivalent.

rdentato post's shows how to fix this, assuming you can play with the grammar. This powerpoint spells out the problem in a bit more detail and shows how to correct it: http://www.google.com/url?sa=t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei=-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ

Can you elaborate ? I don't recall any limitations of mutual-recursiveness, and the Dragon book doesn't mention it. — Eli Bendersky, Sep 24 '08 at 17:59

score 1 · Answer 4 · answered Sep 24 '08 at 18:05

1

ANTLR 3 uses a "LL(*)" parser as opposed to a LL(k) parser, so it will look ahead until it reaches the end of the input if it has to, without backtracking, using a specially optimized determinstic finite automata (DFA).

answered Sep 24 '08 at 18:05

Mark Cidade

98,437
31
224
236

Recursive descent parsing - from LL(1) up

4 Answers4

Lookahead

Backtracking