9

The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):

<expr>      :=  <term> + <term>
            |   <term> - <term>
            |   <term>
<term>      :=  <factor> * <factor>
                <factor> / <factor>
                <factor>
<factor>    :=  <number>
            |   <id>
            |   ( <expr> )
<number>    :=  \d+
<id>        :=  [a-zA-Z_]\w+

Because it is always enough to see the next token in order to know the rule to pick. However, suppose that I add the following rule:

<command>   :=  <expr>
            |   <id> = <expr>

For the purpose of interacting with the calculator on the command line, with variables, like this:

calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49

Is it true that I can not use a simple LL(1) predictive parser to parse <command> rules ? I tried to write the parser for it, but it seems that I need to know more tokens forward. Is the solution to use backtracking, or can I just implement LL(2) and always look two tokens forward ?

How to RD parser generators handle this problem (ANTLR, for instance)?

Vertexwahn
  • 7,709
  • 6
  • 64
  • 90
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • 1
    The grammar is LL(2), but that doesn't mean that you *always* have to look two tokens ahead. You normally look only one token ahead, and two only where required (like in the case where you have to choose between and ). ANTLR for example figures the required lookahead out itself for each rule of the grammar. – stmax May 23 '12 at 20:58

4 Answers4

7

THe problem with

<command>   :=  <expr>
            |   <id> = <expr>

is that when you "see" <id> you can't tell if it's the beginning of an assignement (second rule) or it's a "<factor>". You will only know when you'll read the next token.

AFAIK ANTLR is LL(*) (and is also able to generate rat-pack parsers if I'm not mistaken) so it will probably handle this grammare considering two tokens at once.

If you can play with the grammar I would suggest to either add a keyword for the assignment (e.g. let x = 8) :

<command>   :=  <expr>
            |   "let" <id> "=" <expr>

or use the = to signify evaluation:

<command>   :=  "=" <expr>
            |   <id> "=" <expr>
Remo.D
  • 16,122
  • 6
  • 43
  • 74
5

I think there are two ways to solve this with a recursive descent parser: either by using (more) lookahead or by backtracking.

Lookahead

command() {
    if (currentToken() == id && lookaheadToken() == '=') {
        return assignment();
    } else {
        return expr();
    }
}

Backtracking

command() {
    savedLocation = scanLocation();
    if (accept( id )) {
         identifier = acceptedTokenValue();
         if (!accept( '=' )) {
             setScanLocation( savedLocation );
             return expr();
         }
         return new assignment( identifier, expr() );
    } else {
         return expr();
    }
}
Sven
  • 22,475
  • 4
  • 52
  • 71
2

The problem is that the grammar:


<command>   :=  <expr>
            |   <id> = <expr>

is not a mutually-recursive procedure. For a recursive decent parser you will need to determine a non-recursive equivalent.

rdentato post's shows how to fix this, assuming you can play with the grammar. This powerpoint spells out the problem in a bit more detail and shows how to correct it: http://www.google.com/url?sa=t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei=-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ

John Millikin
  • 197,344
  • 39
  • 212
  • 226
mmattax
  • 27,172
  • 41
  • 116
  • 149
1

ANTLR 3 uses a "LL(*)" parser as opposed to a LL(k) parser, so it will look ahead until it reaches the end of the input if it has to, without backtracking, using a specially optimized determinstic finite automata (DFA).

Mark Cidade
  • 98,437
  • 31
  • 224
  • 236