1

I wanted to make a reader which reads configuration files similar to INI files for mswin. It is for exercise to teach myself using a lexer/parser generator which I made. The grammar is:

%lexer
HEADER ::= "\\[[0-9a-zA-Z]+\\]"
TRUE ::= "yes|true"
FALSE ::= "no|false"
ASSIGN ::= "="
OPTION_NAME ::= "[a-zA-Z][0-9a-zA-Z]*"
INT ::= "[0-9]+"
STRING ::= "\"(\\\"|[^\"])*\""
CODE ::= "<{(.*)}>"
BLANK ::= "[ \t\f]+" :ignore
COMMENT ::= "#[^\n\r]*(\r|\n)?" :ignore
NEWLINE ::= "\r|\n"
%parser
Options ::= OptionGroup Options | OptionGroup | @epsilon@
OptionGroup ::= HEADER NEWLINE OptionsList 
OptionsList ::= Option NEWLINE OptionsList | Option 
Option ::= OPTION_NAME ASSIGN OptionValue
OptionValue ::= TRUE | FALSE | INT | STRING | CODE 

The problem lies in the @epsilon@ production. I added it because I want my reader to accept also empty files. But I'm getting conflicts when 'OptionsList' or 'OptionGroup' contains an epsilon production. I tried rearrange elements in productions, but I'm only getting conflicts (r/r or s/r, depending of what I did), unless I remove the epsilon completely from my grammar. It removes the problem, but...in my logic one of 'OptionsList' or 'OptionGroup' should contain an epsilon, otherwise my goal to accepting empty files is not met.

My parser generator uses LR(1) method, so I thought I can use epsilon productions in my grammar. It seems I'm good at writing generators, but not in constructing error-less grammars :(.

Should I forget about epsilons? Or is my grammar accepting empty inputs even when there is no epsilon production?

Felix.leg
  • 622
  • 1
  • 4
  • 13

1 Answers1

2

Your Options production allows an Options to be a sequence of OptionGroups, starting with either an empty list or a list consisting of a single element. That's obviously ambiguous, because a list of exactly one OptionGroup could be:

  • The base case OptionGroup
  • The base case @epsilon@ with the addition of an OptionGroup.

In short, instead of

Options ::= OptionGroup Options | OptionGroup | @epsilon@

you need

Options ::= OptionGroup Options | @epsilon@

which matches exactly the same set of sentences, but unambiguously.

In general terms, you are usually better off writing left-recursive rules for bottom-up parsers. So I would have written

Options ::= Options OptionGroup | @epsilon@
rici
  • 234,347
  • 28
  • 237
  • 341
  • It works! But for my information: my generator yet doesn't understand instructions of "Make X a list of Ys" so I had to write my production "on foot", and the result was that line. So, my another question is: how to express that NTerm should contain a list of other elements? – Felix.leg Apr 20 '20 at 16:01
  • @Felix: The recursive productions `ListOfX ::= ListOfX X | X` (for non-empty lists) or `ListOfX ::= ListOfX X | @epsilon@` are exactly how you write "A list of Xs" in BNF. But you have to pick one or the other. Either the list can be empty or it cannot. – rici Apr 20 '20 at 16:03
  • Thanks for your answer, and sorry for the edit, I did it before I have read your answer – Felix.leg Apr 20 '20 at 17:43
  • Quick note on left versus right recursion: a typical parser generator that creates table-driven parsers for LR(1) grammars (or LALR(1)), you do not want to write things to be left-recursive as indicated above. The way you wrote it, as right-recursive, is less likely to cause shift/reduce conflicts. Left recursion is better for LL(1) parsers, which are often recursive-descent. The dragon book is a good place to go for information on these topics, if you want a reference. – Thomas Kammeyer Apr 20 '20 at 18:55
  • @thomas: ll(1) parsers cannot handle left-recursion. Check your references. – rici Apr 20 '20 at 19:05
  • Dragon book, page 47. "It is possible for a recursive-descent parser to loop forever. A problem arises with **left-recursive** productions…" – rici Apr 20 '20 at 19:15
  • But for LR parsing, see the Bison manual (a table-driven LALR(1) parser), section [3.3.3, recursive rules](https://www.gnu.org/software/bison/manual/bison.html#Recursion): "Any kind of sequence can be defined using either left recursion or right recursion, but you should **always use left recursion**…" – rici Apr 20 '20 at 19:18
  • Thanks, my bad. I mixed those up. I think my initial reaction was based on my experience with a different set of algorithms (chart parsers, specifically CYK parsers) and the efficieny issue that I observed when dealing with left-recursive versus right-recursive grammars. I managed to muddle that with the LL v. LR question. I should've checked before I spoke. – Thomas Kammeyer Apr 20 '20 at 19:19
  • Another interesting discussion here: http://gallium.inria.fr/blog/lr-lists/ – Thomas Kammeyer Apr 20 '20 at 19:20
  • 1
    @Thomas: Yes, I'm aware of that opinion. Unlike the claim in that post, bison also allocates the parser stack on the heap. However, unlike Menhir, many yacc implementations (including versions of bison) put an arbitrary limit on the size of the parser stack. Here's an interesting example of how this problem can manifest itself itself: https://unix.stackexchange.com/a/186446/24557. There are syntactic contexts where left-recursion solves a conflict, and there are contexts where right-recursion solves a conflict. There are even environments where two lists need to be opposite recursiveness... – rici Apr 20 '20 at 19:39
  • 1
    But, on the whole, for beginners it is generally best to use the advice in the bison manual, and write lists with left-recursion. For one thing, this will avoid the question which comes up pretty regularly on SO: "Why do my lists print backwards?" – rici Apr 20 '20 at 19:41
  • Thanks for making me feel better about my incorrect comment by using it to provide useful additional information. – Thomas Kammeyer Apr 20 '20 at 20:18