4

I have created a grammar, a stripped-down version of which is reproduced below:

(0) exp1: ternary;
(1) exp1: exp2;
(2) ternary: exp2 "?" exp1 ":" exp1;
(3) exp2: exp2 "+" exp3;
(4) exp2: exp3;
(5) exp3: maybe;
(6) exp3: "1";
(7) maybe: exp3 "?";

I believe this language is unambiguous, and should be LR-parsable. (Please let me know if I'm wrong!)

However, when I attempt to generate an LR(1) parser for this grammar, I get shift/reduce conflicts, because when the parser sees exp3 with lookahead ?, it doesn't know whether to shift or reduce:

Conflicts in state 3:
    Reduction using rule 4: exp2:  exp3 · | "?"
    Shift to state 6

Conflicts in state 9:
    Reduction using rule 3: exp2:  exp2 "+" exp3 · | "?"
    Shift to state 6

Conflicts in state 13:
    Reduction using rule 4: exp2:  exp3 · | "?"
    Shift to state 16

Conflicts in state 20:
    Reduction using rule 4: exp2:  exp3 · | "?"
    Shift to state 23

Conflicts in state 25:
    Reduction using rule 3: exp2:  exp2 "+" exp3 · | "?"
    Shift to state 23

Conflicts in state 28:
    Reduction using rule 3: exp2:  exp2 "+" exp3 · | "?"
    Shift to state 16

Is there a reasonable way for me to make this language LR(1)-parsable (with no conflicts)?

Or are GLR (or maybe LR(2)?) my only realistic options for a language like this?
(Or am I even wrong in believing that the language is unambiguous in the first place?)


For reference, the ambiguous state machine I generated is the following (where ♦ is EOF):

State 0:
    exp1:  · ternary | {♦} → shift 1
    ternary:  · exp2 "?" exp1 ":" exp1 | {♦} → shift 2
    exp2:  · exp2 "+" exp3 | {"?", "+"} → shift 2
    exp2:  · exp3 | {"?", "+"} → shift 3
    exp3:  · maybe | {"?", "+"} → shift 4
    exp3:  · "1" | {"?", "+"} → shift 5
    maybe:  · exp3 "?" | {"?", "+"} → shift 3

State 1:
    exp1:  ternary · | {♦} → reduce 0

State 2:
    ternary:  exp2 · "?" exp1 ":" exp1 | {♦} → shift 7
    exp2:  exp2 · "+" exp3 | {"?", "+"} → shift 8

State 3:
    exp2:  exp3 · | {"+"} → reduce 4
    exp2:  exp3 · | {"?"} → reduce 4 shift 6
    maybe:  exp3 · "?" | {"?", "+"} → reduce 4 shift 6

State 4:
    exp3:  maybe · | {"?", "+"} → reduce 5

State 5:
    exp3:  "1" · | {"?", "+"} → reduce 6

State 6:
    maybe:  exp3 "?" · | {"?", "+"} → reduce 7

State 7:
    exp1:  · ternary | {":"} → shift 10
    exp1:  · exp2 | {":"} → shift 11
    ternary:  · exp2 "?" exp1 ":" exp1 | {":"} → shift 11
    ternary:  exp2 "?" · exp1 ":" exp1 | {♦} → shift 12
    exp2:  · exp2 "+" exp3 | {"?", ":", "+"} → shift 11
    exp2:  · exp3 | {"?", ":", "+"} → shift 13
    exp3:  · maybe | {"?", ":", "+"} → shift 14
    exp3:  · "1" | {"?", ":", "+"} → shift 15
    maybe:  · exp3 "?" | {"?", ":", "+"} → shift 13

State 8:
    exp2:  exp2 "+" · exp3 | {"?", "+"} → shift 9
    exp3:  · maybe | {"?", "+"} → shift 4
    exp3:  · "1" | {"?", "+"} → shift 5
    maybe:  · exp3 "?" | {"?", "+"} → shift 9

State 9:
    exp2:  exp2 "+" exp3 · | {"+"} → reduce 3
    exp2:  exp2 "+" exp3 · | {"?"} → reduce 3 shift 6
    maybe:  exp3 · "?" | {"?", "+"} → reduce 3 shift 6

State 10:
    exp1:  ternary · | {":"} → reduce 0

State 11:
    exp1:  exp2 · | {":"} → reduce 1
    ternary:  exp2 · "?" exp1 ":" exp1 | {":"} → shift 26
    exp2:  exp2 · "+" exp3 | {"?", ":", "+"} → shift 27

State 12:
    ternary:  exp2 "?" exp1 · ":" exp1 | {♦} → shift 17

State 13:
    exp2:  exp3 · | {":", "+"} → reduce 4
    exp2:  exp3 · | {"?"} → reduce 4 shift 16
    maybe:  exp3 · "?" | {"?", ":", "+"} → reduce 4 shift 16

State 14:
    exp3:  maybe · | {"?", ":", "+"} → reduce 5

State 15:
    exp3:  "1" · | {"?", ":", "+"} → reduce 6

State 16:
    maybe:  exp3 "?" · | {"?", ":", "+"} → reduce 7

State 17:
    exp1:  · ternary | {♦} → shift 1
    exp1:  · exp2 | {♦} → shift 18
    ternary:  · exp2 "?" exp1 ":" exp1 | {♦} → shift 18
    ternary:  exp2 "?" exp1 ":" · exp1 | {♦} → shift 19
    exp2:  · exp2 "+" exp3 | {♦, "?", "+"} → shift 18
    exp2:  · exp3 | {♦, "?", "+"} → shift 20
    exp3:  · maybe | {♦, "?", "+"} → shift 21
    exp3:  · "1" | {♦, "?", "+"} → shift 22
    maybe:  · exp3 "?" | {♦, "?", "+"} → shift 20

State 18:
    exp1:  exp2 · | {♦} → reduce 1
    ternary:  exp2 · "?" exp1 ":" exp1 | {♦} → shift 7
    exp2:  exp2 · "+" exp3 | {♦, "?", "+"} → shift 24

State 19:
    ternary:  exp2 "?" exp1 ":" exp1 · | {♦} → reduce 2

State 20:
    exp2:  exp3 · | {♦, "+"} → reduce 4
    exp2:  exp3 · | {"?"} → reduce 4 shift 23
    maybe:  exp3 · "?" | {♦, "?", "+"} → reduce 4 shift 23

State 21:
    exp3:  maybe · | {♦, "?", "+"} → reduce 5

State 22:
    exp3:  "1" · | {♦, "?", "+"} → reduce 6

State 23:
    maybe:  exp3 "?" · | {♦, "?", "+"} → reduce 7

State 24:
    exp2:  exp2 "+" · exp3 | {♦, "?", "+"} → shift 25
    exp3:  · maybe | {♦, "?", "+"} → shift 21
    exp3:  · "1" | {♦, "?", "+"} → shift 22
    maybe:  · exp3 "?" | {♦, "?", "+"} → shift 25

State 25:
    exp2:  exp2 "+" exp3 · | {♦, "+"} → reduce 3
    exp2:  exp2 "+" exp3 · | {"?"} → reduce 3 shift 23
    maybe:  exp3 · "?" | {♦, "?", "+"} → reduce 3 shift 23

State 26:
    exp1:  · ternary | {":"} → shift 10
    exp1:  · exp2 | {":"} → shift 11
    ternary:  · exp2 "?" exp1 ":" exp1 | {":"} → shift 11
    ternary:  exp2 "?" · exp1 ":" exp1 | {":"} → shift 29
    exp2:  · exp2 "+" exp3 | {"?", ":", "+"} → shift 11
    exp2:  · exp3 | {"?", ":", "+"} → shift 13
    exp3:  · maybe | {"?", ":", "+"} → shift 14
    exp3:  · "1" | {"?", ":", "+"} → shift 15
    maybe:  · exp3 "?" | {"?", ":", "+"} → shift 13

State 27:
    exp2:  exp2 "+" · exp3 | {"?", ":", "+"} → shift 28
    exp3:  · maybe | {"?", ":", "+"} → shift 14
    exp3:  · "1" | {"?", ":", "+"} → shift 15
    maybe:  · exp3 "?" | {"?", ":", "+"} → shift 28

State 28:
    exp2:  exp2 "+" exp3 · | {":", "+"} → reduce 3
    exp2:  exp2 "+" exp3 · | {"?"} → reduce 3 shift 16
    maybe:  exp3 · "?" | {"?", ":", "+"} → reduce 3 shift 16

State 29:
    ternary:  exp2 "?" exp1 · ":" exp1 | {":"} → shift 30

State 30:
    exp1:  · ternary | {":"} → shift 10
    exp1:  · exp2 | {":"} → shift 11
    ternary:  · exp2 "?" exp1 ":" exp1 | {":"} → shift 11
    ternary:  exp2 "?" exp1 ":" · exp1 | {":"} → shift 31
    exp2:  · exp2 "+" exp3 | {"?", ":", "+"} → shift 11
    exp2:  · exp3 | {"?", ":", "+"} → shift 13
    exp3:  · maybe | {"?", ":", "+"} → shift 14
    exp3:  · "1" | {"?", ":", "+"} → shift 15
    maybe:  · exp3 "?" | {"?", ":", "+"} → shift 13

State 31:
    ternary:  exp2 "?" exp1 ":" exp1 · | {":"} → reduce 2
user541686
  • 205,094
  • 128
  • 528
  • 886
  • The fact that a grammar isn't ambiguous doesn't necessarily mean that it's LR-parsable. Any *deterministic* CFL has at least one LR(1) grammar, and all deterministic CFLs also have unambiguous grammars. However, not all unambiguous grammars describe deterministic CFLs. – templatetypedef Mar 03 '13 at 20:23
  • @templatetypedef: Right, I know unambiguity doesn't imply LR-parsability, but I thought this one was indeed LR-parsable (seems like LR(2) would handle it fine, not sure though). Are you saying it isn't? (e.g. Does it require unbounded lookahead?) – user541686 Mar 03 '13 at 20:24

1 Answers1

1

I think this might be a precedence issue. The conflicts you're getting occur when the parser is looking at something like this:

 a + b ? c : d

At the time that the parser has seen a + b ? and is looking at c, it can't decide whether it needs to

  1. Reduce b?, so that it will parse an expression of the form a + (b?) and then continue from there, or

  2. Reduce a + b, so that it will parse an expression of the form (a + b) ? c : d

I think the challenge here is that in one case, ? has very low precedence (when used as a ternary operator), and in another case it has very high precedence (when used as a unary operator). However, if you did assign the precedences this way, I think that the parser might be able to disambiguate these cases.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Hmm, `a + b ? c : d` should be parsed as `(a + b) ? c : d` and not `a + (b ? c : d)` though, right? In which case, once it sees `c`, it knows the construct is a ternary expression. (I was trying to keep the semantics of `?` in languages like C.) – user541686 Mar 03 '13 at 20:29
  • @Mehrdad- Ah, yes, my mistake. That's the real shift-reduce conflict. I'll update my answer. However, I still think it's a precedence issue. – templatetypedef Mar 03 '13 at 20:31
  • Ah yes, okay, so it's a precedence issue from *our* standpoint (I indeed need a unary `?` to bind tighter than a ternary `?`), but the parser generator doesn't really have any notion of "precedence"; it's just parsing the grammar, and the precedence is implied by the grammar. I'm not sure what you mean by "if you did assign the precedences this way"... are you proposing I change the precedences implied by the grammar (which I can't do, because the semantics would change), or did you mean I should fix it some other way? – user541686 Mar 03 '13 at 20:34
  • What parser generator are you using? Most standard LR parser generators (`yacc`, `bison`, etc.) allow for precedence specifications, since it dramatically simplifies writing the grammar. – templatetypedef Mar 03 '13 at 20:42
  • It's one I wrote myself. :) – user541686 Mar 03 '13 at 20:44
  • 1
    @Mehrdad- You are awesome! It's actually not that hard to incorporate precedences into an LR(1) parser generator, and it dramatically simplifies grammar creation. The basic idea is that any time you get a shift/reduce conflict, you determine whether to shift or reduce based on the precedence of the first terminal in the production as compared to the lookahead. The Purple Dragon Book has some details about how to do this. Barring that, you'd need to rewrite a lot of the grammar to implicitly enforce a precedence, which is pretty tricky. It's your call. – templatetypedef Mar 03 '13 at 20:48
  • Thanks! That's good info, I might do it sometime. :) But hmm, I'm actually not convinced precedence would solve it. The parser doesn't even have a *chance* to see `c` here, as you had in your example. Rather, it gets confused when it sees an `exp3`, and has `?` as its lookahead token. How can precedence solve that problem? It can't see what's after `?` yet. I feel like I *have* to rewrite the grammar (while still keeping the tightness of `?`'s binding), but I have no idea *how* to do it, especially since I'm not sure how precedence would solve it. – user541686 Mar 03 '13 at 20:50