recursive descent parser in C code

Question

I want to make recursive descent parser in C. Given conditions are here:

<prg> -> <stmts>
<stmts> -> <stmt> [;stmts]
<stmt> -> <assign> | <if>
<if> -> if '(' <expr> ')' ('{' <stmts> '}' | <stmt>)
<expr> -> <term> {(+ | -) <term>
<term> -> <factor> {(* | / ) <factor>}
<factor> -> <const> | <id> | '(' <expr> ')'

<id> -> <letter> {<letter>|<const>}
<letter> -> a|b|c
<digit> -> 0|1|2

and I got addChar(), getChar(), lex() func. I was wondering how to process "<"stmt>, "<"if>, "<"letter>. Also, What is the meaning of [;stmts] and {"<"letter>|"<"const>} ?

Closers: it looks to me like the OP has tried, and I understand why he is confused. See my answer. — Ira Baxter, Oct 22 '16 at 14:30

score 2 · Answer 1 · edited May 23 '17 at 12:33

See my SO answer on how to write recursive descent parsers:

Where did you get this grammar? People unfortunately write grammars in BNF with a variety of slightly different notations, and don't tell you what conventions they are using. Check Wikipedia for BNF to see some variations.

Often you have to take your understanding of BNF and its variations, and your understanding of the language the BNF is intended to describe, and interpret the BNF to guess what they actually meant. [That's stupid; the purpose of BNF is to take out all the guesswork]. Regarding

[;stmts]

and using the knowledge that most languages allow a sequence of statements, I guess that [ X ] means "0 or more X in a sequence" and that the specific interpretation here means is "0 or more statements each preceded by a ';'".

Given the apparant convention that a BNF rule for concept X is referenced as <X>, this grammar is broken here; the writer was sloppy and should have written [; <stmts>].

It appears that the convention for indicating literal characters in this BNF style is to use the literal character when it not a character used in the BNF metasyntax. The BNF metasyntax characters used is this grammar appear to be

  < >   ( ) [  ]  {  }  '

with -> as a special token. Any literal character not in this set is simply written as itself; any literal character in this set must be quoted. This interpretation seems to explain most of the grammar with the exception of -. Rules like this may make the grammar look cleaner (sort of) to the author, but make it hard for the reader to understand.

Anyway, that explains why [; stmts] isn't written as [ ';' stmts]. IMHO, the right convention is to always quote literal characters; then the reader never has to guess.

Regarding

{<letter>|<const>}

and again using our understanding of BNF variations, and know that identifiers are usually sequences of letters and digits, I conclude that { X } " means "0 or more of X" and that the intention here is "0 or more letter or digit".

The fact that I interpreted both [ X ] and { X } as meaning "0 or more" suggests my guess is wrong; why use two different notations to mean the same thing? So maybe they mean different things; maybe [ X ] means "one or more". I can't tell by just looking at the grammar.

The fact that I have to guess so much for such a small grammar is bad news. I think the conventions chosen by this grammar author are a disservice to the reader. And that's especially bad news for a beginner because it adds confusion to an already tricky topic.

On behalf of whoever wrote this grammar and gave it to you, I apologize.

But, take the lesson. BNF grammars in many cases are written sloppily. Since you can't fix the authors, you need to learn to interpret what you get. Bummer.

recursive descent parser in C code

1 Answers1