0

My initial problem was to find out whether it is possible to parse the following context-free grammar: grammar_part_1 ; grammar_part_2 (examples) and, if not, edit the grammar so that it is.

I looked for two things: left recursion and ambiguity. Unfortunately, I couldn't find any of those issues apart from the case, where you choose an ident that is similar to a terminal symbol, which is not permitted by definition.

Now, there are three solutions to this problem:

  1. The grammar is parseable by a recursive descent parser without backtracking.
  2. It is the parsers task to obey the definition (e.g. the given rule of "no similar terminal symbol- and ident names") where extending the ident rule with an identification terminal symbol would solve the issue.
  3. There is another kind of grammatical issue apart from the mentioned ones that can occur with these parsers I haven't thought of.

Assuming that my third idea is right, what would these methods be and, if not, are there any other methods to find out whether a grammar is parseable by a recursive descent parser without backtracking? Is assumption 2 true?

---------------- EDIT --------------------

The grammar:

Prog   -> Def^+
Def    -> DEF Left == Expr
Left   -> MAIN : Type
        | Ident ([Ident:Type(, Ident:Type)^*]):Type
Type   -> NAT
        | BOOL
Expr   -> Num
        | TRUE
        | FALSE
        | Ident[([(Expr(, Expr)^*)])]
        | IF Expr THEN Expr [ELSE Expr] FI
Ident  -> (a|...|z)^+
Num    -> (0|...|9)^+

symbols in caps (plus '==', ':', right hand side of Ident and Num) are terminals; (), [], ^+ and ^* are notation operators. Rest is non-terminals

1 Answers1

0

Top-down parsing without backtracking is also impossible if two applicable productions have right-hand sides starting with the same symbol(s). That can require left-factoring.

A more complicated issue is when two non-terminals whose FIRST sets are not disjoint are both the first symbol in different productions for the same non-terminal. While a form of left-factoring might work in this case as well, there is no guarantee that an arbitrary context-free grammar can be parsed with a top-down parser without backtracking.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Ah. I see I provided essentially the same answer as yours. Deleting mine as redundant. – Ira Baxter May 23 '18 at 15:41
  • ... well, maybe not. You suggest refactoring, but there isn't any guarantee that you can reasonably succeed. – Ira Baxter May 23 '18 at 15:51
  • @ira: there is never a guarantee that top-down parsing without backtracking is possible. That doesn't seem to stop people from wanting to do it, or to stop academics from teaching it before moving on to algorithms which work. Those two facts may be related. – rici May 23 '18 at 16:14
  • I get it! For every guarded production you can create two new productions, one containing the old one without the guarded word and another containing the old one plus the guarded word. This applies for the 2nd condition in _Left_ and the 4th condition in _Expr_ and if I rewrite them I have several conditions with same starting symbols thus requiring left-factoring to be parsed by a recursive descent parser without backtracking! Is that right? –  May 23 '18 at 16:45
  • @belserich: i didn't look at your grammar because it's an off-site link. You should include it in your question as text. – rici May 23 '18 at 16:53
  • @belserich: yes, if you transform the grammar by duplicating productions with optional components, you will then need to left-factor. In a real-world recursive descent parser, that probably wouldn't be necessary; the optional segment would simply be placed in an `if` block. But real-world r.d. parsers are computer programs, not theoretical abstractions, and therefore have more flexibility. – rici May 23 '18 at 17:34