2

I am trying to create a LALR(1) parser for the following grammar and find some shift/reduce conflicts.

S := expr
expr := lval | ID '[' expr ']' OF expr
lval := ID | lval '[' expr ']'

The conflicts

So the parser can not parse the string "ID[ID]" correctly. My questions are,

  1. Are there any general ways to convert such non-LALR(1) grammars into LALR(1) grammars?
  2. If two grammars generate exactly the same languages and we know that one is not LALR(1), can we know if the other is LALR(1)?

The grammar mentioned above is only an example, what I really want to know is general ways to solve these grammar problems. Any suggestions or reading recommendations are welcome.

Thanks in advance.

MingyanGuo
  • 55
  • 7

1 Answers1

3

1 . Are there any general ways to convert such non-LALR(1) grammars into LALR(1) grammars?

No. It may or may not be possible to convert an arbitrary context-free grammar (CFG) into an LALR(1) grammar. Moreover, if you have a CFG and an LALR(1) grammar, you cannot tell whether they recognize the same language. (Worse, there is no algorithm which will even tell you whether an arbitrary CFG recognizes every possible string for its alphabet.)

2 . If two grammars generate exactly the same languages and we know that one is not LALR(1), can we know if the other is LALR(1)?

Again, no. As above, there is no algorithm which can verify that two grammars generate the same language, but even supposing that you know that two grammars generate the same language, the fact that one of them is not LALR(1) tells you nothing about the other one.

There is, however, one useful result. If you have an LALR(k) grammar with finite k > 1, then you can generate an LALR(1) grammar. In other words, there is no such thing as an LALR(k) language for k > 1; if a language has an LALR(k) grammar, it has an LALR(k') grammar for any k' such that 1 ≤ k' < k.

That doesn't help you with your grammar, however, because the conflict cannot be eliminated by increasing lookahead to any finite value.

There is an easy way to get rid of that particular shift-reduce conflict, though, and it is a technique which often works. Consider the two conflicting rules:

lval := lval '[' expr ']'
expr := ID   '[' expr ']' OF expr

The problem is that in the first case, ID must be reduced to lval immediately (or at least, before the following expr is reduced), but in the second case it may not be reduced to lval. But we can't tell which case we're in until we reduce the expr and encounter the OF (or not).

If we could finish the lval production without doing the interior lval reduction, then we wouldn't have a problem, because the actual reduction would happen when the token following the ] is visible.

There's probably a technical term for this, but I don't know it. I've always described it as "reduction deferral", and in many cases it is not very difficult:

lval' := ID `[` expr `]`
      |  lval' `[` expr `]`
lval  := ID
      |  lval'
expr  := lval
      |  ID '[' expr ']' OF expr
rici
  • 234,347
  • 28
  • 237
  • 341
  • A very informative answer. A question about the techniques you used in the answer, how can I find these common techniques on grammar transforms? Thanks a lot. – MingyanGuo Aug 29 '13 at 04:01
  • @MingyanGuo: I wish I had a reference. I'm sure I didn't think that one up by myself, but I wasn't able to google a citation, and I've been using it for so long that I can't remember where it comes from. Sorry. The undecidability result is simple and well-known; it's based on constructing a grammar which represents a Turing machine's computation. I'm pretty sure it's in Hopcroft&Ullman. I got the LALR(k>1) reduction from Sippu&Soisalon-Soinenen (Parsing Theory) but I think you can find it in just about any good reference. – rici Aug 29 '13 at 04:05
  • 1
    Actually, now that I think of it, the technique I used is very similar to the LR(k) covering algorithm as described by S&S-S. Suppose you construct an LR(k) parser for a grammar; in effect, you've computed the set of possible lookaheads for every production. So you can replace every non-terminal N with a set of non-terminals (N,L1,L2...Lk-1), working out the substitutions for every non-terminal in the RHS. That obviously gives you an LR(1) grammar. Sometimes you can do something similar with non-canonical lookaheads (i.e. including non-terminals). I think that's essentially what I did. – rici Aug 29 '13 at 04:11
  • I see, I need understand some more general grammars and parsing techniques. Thanks. – MingyanGuo Aug 29 '13 at 04:26
  • 1
    There's an easier path. Use GLR parsers; they will parse any context-free grammar. C++ is famously hard to parse using LALR(1), e.g., you can do it but only by hacking the lexer/parser/symbol table modules to be intricately coupled. See http://stackoverflow.com/a/6320330/120163 for contrast. – Ira Baxter Aug 29 '13 at 05:16