7

Let's say I have this grammar:

A: ε
 | B 'a'
B: ε
 | B 'b'

What is considered to be the closure of the item A: • B 'a'?
In other words, how do I deal with the epsilon transitions when figuring out closures?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    Similar question for an LR(1) parser: [LR1 Parser and Epsilon](http://stackoverflow.com/questions/486920/lr1-parser-and-epsilon) – Markus Jarderot Oct 27 '12 at 11:36

1 Answers1

5

This is pretty straightforward. Included in the closure of

    A = ... <dot> X ... ;

are all the rules

    X =   <dot> R1 R2 R3 ... ;

where first(R1) is not empty. For each (nonempty) token K in first(R1), you'll need to (transitively!) include

    R1 = <dot> k ... ;

etc. but presumably you are already clear on this.

You specific question is what happens if R1 can be empty? Then you also need to include

    X =   R1 <dot> R2 ... ;

Similarly for R2 being empty, if R1 can be empty, and similarly for Ri being empty if R1 .. Ri-1 can be empty. In extreme circumstances, all the Ri can be empty (lots of optional subclauses in your grammar), and you can end up including

    X =  R1 R2 ... Rn <dot> ;

Note that determining that first(R1) "can be empty" is itself a transitive closure question.

The GLR parser generator that I built for DMS precomputes first_can_be_empty using Warshall's algorithm and then uses that in the closure construction.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Huh, interesting... to clarify, by "Warshall" do you mean Floyd-Warshall? i.e. the all-pairs shortest paths algorithm? – user541686 Oct 29 '12 at 19:16
  • Yes, but you don't need the "shortest" path, just the "existence" of a path, so you can do the whole thing with bits, and then you realize you can do it with bit vectors for a pretty significant speedup. – Ira Baxter Oct 29 '12 at 19:43