1

I am trying to write a recursive descent parser without backtracking for a kind of EBNF like this:

<a> ::= b [c] | d

where

  • <a> = non-terminal

  • lower-case-string = identifier

  • [term-in-brackets] = term-in-brackets is optional

  • a|b is the usual mutually exclusive choice between a and b.

For now, I care only about the right hand side.

Following the example at http://en.wikipedia.org/wiki/Recursive_descent_parser, I eventually ended up with the following procedures (rule in GNU bison syntax in comments above):

/* expression: term optional_or_term */
void expression()
{
    term();
    if (sym == OR_SYM)
        optional_or_term();

}

/* optional_or_term: // empty
    | OR_SYM term optional_or_term
*/
void optional_or_term()
{
    while (sym == OR_SYM)
    {
        getsym();
        term();
    }
}

/* term: factor | factor term */
void term()
{
    factor();
    if (sym == EOF_SYM || sym == RIGHT_SQUAREB_SYM)
    {
        ;
    }
    else if (sym == IDENTIFIER_SYM || sym == LEFT_SQUAREB_SYM)
        term();
    else if (sym == OR_SYM)
        optional_or_term();
    else
    {
        error("term: syntax error");
        getsym();
    }

}

/*
factor: IDENTIFIER_SYM  
    | LEFT_SQUAREB_SYM expression RIGHT_SQUAREB_SYM
*/

void factor()
{
    if (accept(IDENTIFIER_SYM))
    {
        ;
    }
    else if (accept(LEFT_SQUAREB_SYM))
    {
        expression();
        expect(RIGHT_SQUAREB_SYM);
    }
    else
    {
        error("factor: syntax error");
        getsym();
    }

}

It seems to be working, but my expectation was that each procedure would correspond closely with the corresponding rule. You will notice that term() does not.

My question is: did the grammar need more transformation before the procedures were written?

Test User
  • 37
  • 1
  • 9

1 Answers1

1

I don't think your problem is the absence of operators for concatenation. I think it is not using Kleene star (and plus) for lists of things. The Kleene star lets you actually code a loop inside a procedure that implements the grammar rule.

I would have written your grammar as:

expression = term (OR_SYM term)*;
term = factor+;
factor = IDENTIFIER_SYM | LEFT_SQUAREB_SYM expression RIGHT_SQUAREB_SYM ;

(This is a pretty classic grammar for a grammar).

The parser code then looks like:

 boolean function expression()
 {   if term()
     {   loop
         { if OR_SYM()
           {  if term()
              {}
              else syntax_error();
           }
           else return true;
         }
     else return false;
 }

 boolean term()
 {  if factor()
    {  loop
       {  if factor()
          {}
          else return true;
       }
    }
    else return false;
 }

 boolean factor()
 {  if IDENTIFIER(SYM)
    return true;
    else 
    { if LEFT_SQUAREB_SYM()
      {  if expression()
         {   if RIGHT_SQUAREB_SYM()
             return true;
             else syntax_error();
         }
         else syntax_error();
      else return false;
    }
 }

I tried to generate this in an absolutely mechanical way, and you can do pretty well like this. I did a lot of this earlier my career.

What you're not going to get is 150 working rules per day. First, for a big language, it is hard to get the grammar right; you'll be tweaking it repeatedly to get a grammar that works in the abstract, then you have to adjust the code you wrote. Next you'll discover that writing the lexer has its troubles too; just try writing a lexer for Java. Finally, you'll discover that parser rules isn't the whole game or even the biggest part of your effort; you need a lot to process real code. I call this "Life After Parsing"; see my bio for more information.

If you want to get 150 working rules per day, switch to a GLR parser and stop coding parsers manually. That won't address the other issues, but it does make you incredibly productive at getting out a usable grammar. This is what I do now. Without exception. (Our DMS Software Reengineering Toolkit uses this, and we parse a lot of things that people claim are hard.)

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • So yes, I needed to rewrite my grammar using extensions that I was never taught and use more looping and less recursion. Seriously, what the looping does is makes it more convenient to ignore characters that are not wanted _at that moment_. While parsing `[a]`, `a` is recognised as a factor, and I try to gather more factors by calling `factor` recursively. The next character is a ']', which should be ignored at this point so that the calling `factor` which is parsing `'[' expression ']'` can deal with it. I can't ignore it generally, because ']' is not a factor and cannot start a factor. – Test User Sep 30 '14 at 16:50
  • The predicates that check if characters are in the input stream need to be able to back up if what they want is not there. You'll additional trouble with whitespace: where is that consumed? (I've assumed it is automatically consumed by all character-checking routines. That may not be what you want, esp. if you want to keep comments). Your best bet is to go build a couple of parsers like this to get a feel for strenghts (sort of easy to code) vs. weaknesses. In the long term, you'll find using parser generators is easier, because they are stronger and handle more general cases. – Ira Baxter Sep 30 '14 at 17:04
  • 1
    .... I build my first RD parser back in 1969. I still build them for very small grammars when I'm in a very big hurry. But 40 years of experience says, "Use a parser generator". You choose. – Ira Baxter Sep 30 '14 at 17:06
  • Whitespace and comments are stripped out by my lexer. I have a full working version in bison/flex, a recursive descent with backtracking "recogniser" (just tells whether the input is valid or not) and now I am writing a predictive "recogniser". I am not entering the compiler-writing business. I just want to be able to write a parser if I _must_ use a particular language and there is no free (as in free beer) parser generator available, or I am disinclined to learn it for one-time use. I think I'll look back at my original predictive implementation. I probably need to revisit my error handling. – Test User Sep 30 '14 at 18:37