0

I'm writing a programming language parser and I'm stuck in this Shift/Reduce Conflict.

Here's the conflict state in the parser.output file obtained via running bison with -v

State 1

   24 ident: TIDENT .
   26 call: TIDENT . TLPAREN args TRPAREN

    TLPAREN  shift, and go to state 24

    TLPAREN   [reduce using rule 24 (ident)]
    $default  reduce using rule 24 (ident)

The conflict is happening when I'm trying to implement a rule for call, it seems to conflict with the normal ident rule.

Here's some parts of the grammar, (actions removed for simplicity but they shouldn't be needed. also I'm not sure if the order in which rules are defined matters, correct me if I'm wrong)

(capital letters are tokens)

The ident rule is simply

ident: TIDENT
          ;

Args, used by call.

args: /* empty */
        |
        expr
        |
        args TCOMMA expr
        ;

Call for calling a function

call:
       TIDENT TLPAREN args TRPAREN
       ;

Expr for expressions

expr:
    number
    |
    ternary
    |
    bool
    |
    string
    |
    ident
    |
    call
    |
    TLPAREN expr TRPAREN
    |
    expr TPLUS expr
    |
    expr TMINUS expr
    |
    expr TSLASH expr
    |
    expr TSTAR expr
    |
    expr TGT expr
    |
    expr TGE expr
    | 
    expr TLT expr
    |
    expr TLE expr
    ;

The question: why does the grammar have a shift/reduce conflict and how do you fix it? I've seen similar style parsers that do not have the conflicts its really weird.

If you need to see the full grammar for reproducing here's a hastebin https://hasteb.in/zozifopi.shell

If you need more details about anything else then please let me know in the comments and I'll edit the question accordingly.

PoLLeN
  • 61
  • 7
  • Welcome to Stck Overflow. Please read the [Ask] and [About] pages soon, but more urgently, read about how to create an MCVE ([MCVE]). You've provided fragments of a grammar, but not a complete, minimal grammar that reproduces the problem you're running into. We have to work hard to see what's going on; we can't just copy'n'paste from your question. – Jonathan Leffler Feb 14 '19 at 18:20
  • Sorry, just edited to include the full code – PoLLeN Feb 14 '19 at 18:25
  • Full code isn't what's wanted — the _Minimal_ in MCVE is important too. I'll look at it. My first guess at a minimal version of your grammar didn't reproduce your SR conflict (it had an SR conflict, but not that one). – Jonathan Leffler Feb 14 '19 at 18:26
  • @pollen: Full code is not the same as "a link to a pastebin which at some point in time contained the code". You should include the entire grammar, along with relevant declarations (precedence and token declarations; semantic declarations only if relevant to your question). Personally, I appreciate styles which don't use so much vertical whitespace since I often look at questions with a small screen device. But that's just me; you don't need to take it into account. What's important is that the included code can be used without modification, and that it exhibits the problem. – rici Feb 14 '19 at 21:24

3 Answers3

2

The fundamental problem here is that your grammar is ambiguous because statements don't need to be terminated (stmts: stmts stmt) and a statement can be an expression. So two expressions can appear one after another without any punctuation. That means that f(3) could be one expression (a function call) or two expressions (f and (3)).

If you're happy for the parser to always interpret that as a function call, (which is its default behaviour, since it prefers to shift), then you could just add a couple of precedence declarations, so that the call has higher precedence than the reduction:

%precedence TIDENT
//...
%precedence TLPAREN
// ...
%%
expr : ident %prec TIDENT

That just papers over the ambiguity, and may cause surprising parses. But the only other solution is to make the language unambiguous.

rici
  • 234,347
  • 28
  • 237
  • 341
  • ah, finally a good answer, thank you for your clear explanation, statement terminators is planned but i figured i would start simple for now didn't think about that problems. – PoLLeN Feb 14 '19 at 22:00
0

The problem is that when the parser has shifted a TIDENT token and looks ahead to a TLPAREN token, the grammar permits two alternatives:

  1. reduce the TIDENT to an ident, or
  2. shift the TLPAREN.

Bison will normally resolve shift / reduce conflicts by choosing to shift, and if that's what you want in this case then you could simply ignore the warning.

In this particular case, however, you should be able to resolve the conflict by changing the rule for the call production:

call:
       ident TLPAREN args TRPAREN
       ;

With that rule, it is no longer an option to shift the TLPAREN without first reducing the TIDENT to an ident.

Alternatively, you could consider removing the ident non-terminal altogether, and instead using TIDENT directly wherever you now use ident. There may be other alternatives, too. Which works best for you may depend on what you're trying to do with your semantic actions. I can't comment any more specifically on that, as you have chosen to keep the semantic actions out of our consideration.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Using TIDENT directly everywhere didn't work, i tried using the ident non-terminal for the call rule like your first suggestion but that instead added up an extra shift/reduce conflict. – PoLLeN Feb 14 '19 at 19:01
  • ok ignore the part about the extra conflict that was another fault, but still i get the one conflict from ident – PoLLeN Feb 14 '19 at 19:10
  • Looking at your full grammar, @PoLLeN, using `TIDENT` doesn't work because of token type mismatches that were not evident in your question. As for switching to using `ident` in the production for `call` you may still get a conflict, even a similar one, but you cannot get *the same* conflict, because one of the states involved in the original one does not exist the revised grammar. – John Bollinger Feb 14 '19 at 20:00
  • Overall, SO is not a venue for extended debugging collaboration. I have answered the question posed: "why does the grammar have a shift/reduce conflict and how do you fix it?" It looks like to move past the next conflict you'll need to refactor your production for `expr`. Additionally, you may find that the language you're trying to parse is not LALR(1), so that any bison grammar you write will necessarily contain one or more conflicts. If it's resolving the ambiguity correctly, then this may be an issue to just let go. – John Bollinger Feb 14 '19 at 20:05
  • Bison by default chooses to shift which is what i want but for some reason i don't feel comfortable with that warning in there, could there be a way to tell bison that you do want the default action so as to not emit a warning? – PoLLeN Feb 14 '19 at 20:10
  • Not exactly, @PoLLeN, but you can use an `%expect` directive to tell it that you expect there to be one conflict. As long as there are exactly that many, Bison will not emit a warning. – John Bollinger Feb 14 '19 at 20:16
0

Bison by default generates a LR parser, which is a bottom-up parser which can decide on each state if shifting a token or reducing it.

The conflict is really simple and well explained by the output itself (I wonder what's not clear), it's telling you:

If I find an IDENTIFIER should I reduce it through rule 24 to an ident non terminal or should I shift it as in call rule?

This because once reduced you can't shift and viceversa, which created indeed a conflict.

To solve the conflict you need to move that choice in the same state of the parser so that it's able to decide by the context.

Since ident has just a terminal IDENT rule and same applies for call you could easily move everything at same level to make it always shift:

expr: 
  IDENT | 
  IDENT LPAREN args RPAREN |
  ...

or use the same ident non terminal for both call and expr itself so it will always reduce it.

Jack
  • 131,802
  • 30
  • 241
  • 343
  • I get the conflict but just not sure how to solve it, I tried your solution of putting the ident and call rules in expr instead of a seperate non-terminal rule but that still didn't solve the conflict. – PoLLeN Feb 14 '19 at 18:39
  • @PoLLeN: you need to remove `call` and `ident` from `expr` non terminals of course – Jack Feb 14 '19 at 18:41
  • I did that, here's the code again incase i miss understood https://hasteb.in/gacehava.shell – PoLLeN Feb 14 '19 at 18:44
  • Oh, that's because you're using `ident` non terminal in other rules, my answer applied to your specific rules. Usually there's not a panacea for conflicts in a Bison grammar, you need to understand why this is happening, try to use `ident` non terminal EVERYWHERE instead that `IDENT`. – Jack Feb 14 '19 at 18:47
  • Um what i don't really get it now, use ident non-terminal everywhere? Weren't you saying to use the TIDENT token directly, non of other rules are using ident non-terminal at the moment i just kept it there incase i need it later. – PoLLeN Feb 14 '19 at 18:52
  • yes I suggested it if you are able to use TINDENT in a single rule, if you have multiple TINDENT around my solution doesn't work anymore, which is why I'm trying to explain that you need to visualize the state machine of your parser to be able to figure out the best solution for your grammar. – Jack Feb 14 '19 at 18:53
  • Ok i kinda get your answer now, but i can't use ident non-terminal everywhere, since it creates an ast node for a variable acess when i usually only need the string value of it. – PoLLeN Feb 14 '19 at 19:15