LR(1) parsing table with epsilon productions

Question

I'm having trouble building the collection of sets of items for LR(1) parsers with a grammar containing epsilon productions. For example, given the following grammar (where eps stands for epsilon)

S -> a S U
U -> b
  |  eps

State0 would be

S' -> .S, $
S -> .a S U, $

Moving with 'a' from State0 would give the following state, let's call it State2

S -> a .S U, $
S -> .a S U, $/???

In order to have the lookahead for the second item of State2 I need to calculate FIRST(U$). I know that FIRST(U) = {'b', eps}. My first question is: the lookaheads of the second item of State2 are $ and 'b'? Since U can be eps, my brain tells me that I can have $ as a lookahead as well, not just 'b'. It would have been just 'b' if FIRST(U) would have been just {'b'}. Is that correct?

Second question: at some point I will have a state as the following one

S -> a S .U, $
U -> .b, $
U -> .eps, $

What do I do here? Do I need to move with eps and have a set with the item U -> eps., $? What if I have another terminal as lookahead, i.e. X -> .eps, a/$? And if I move, ending up having a set of the form X -> eps., $, do I reduce?

And more: do I need to insert eps in the parse table as a symbol?

Thanks

What are you using as a textbook? It surprises me that there are no examples with nullable non-terminals. — rici, May 17 '18 at 16:24
The Dragon Book. I don't really know if I missed some parts, but I didn't read about how to handle these kind of situations specifically... — Astinog, May 17 '18 at 19:57
Yup. I dug out my copy of the Dragon Book and indeed there is no worked example of an LR automaton with an ε-production. Still, the book is clear that ε represents the empty sequence of symbols, not a special symbol. (Indeed, the book always uses Greek letters for sequences of symbols and Roman letters for individual symbols, which has become a standard convention.)... — rici, May 19 '18 at 02:05
Also, its definition of FIRST(α) is, in effect, "the set of possible first characters in a derivation of α plus ε if α could derive the empty set, which means that it is a subset of `Σ ∪ ε`. But in the case of FIRST(β$) it is clear that Β$ cannot derive the empty sequence, so the FIRST set must be a subset of Σ. — rici, May 19 '18 at 02:05

rici · Answer 1 · 2019-09-20T02:48:09.203

FIRST(U$) means "the set of symbols which could be first in a derivation of U$". Clearly, if U can derive the empty string, $ must be part of this set. The end-of-input marker $ ensures that we never have to worry about epsilons in the FIRST sets. (If we were doing LR(k) instead of LR(1), we would use k end markers so that all the strings in FIRST_k had length k.
The item associated with U → (or with U → ε if you insist) is U → • . In other words, it is reducible and should trigger a reduce action on matching lookahead.
ε is not a symbol; we only use it (sometimes) to make the empty string visible. But the empty string is empty.

LR(1) parsing table with epsilon productions

1 Answers1

Linked