2

I have the following Bison grammar snippet:

binary_op:         BINARY_OP
                    {
                        ...
                    }
                    | '|' %prec BINARY_OP
                    {
                        ...
                    }
;

non_keyword_expr:   non_keyword_expr binary_op non_keyword_expr %prec BINARY_SEND_PREC %dprec 2
                    {
                        ...
                    }
;

| has overloaded meaning in my grammar, so I can't just return it as token BINARY_OP from my lexer. It could be a different token depending on context.

If I use this as my input:

4 OR 5 OR 6

I can parse it successfully (OR is recognized as a BINARY_OP token by the lexer).

If, however, my input is this:

4 | 5 | 6

I get an ambiguous grammar error. (the | isn't being recognized as left associative)

How can I get binary_op to be left-associative within non_keyword_expr? the %prec statement on the second rule for binary_op seems to have no effect.

edit: this is for a GLR parser

nielsbot
  • 15,922
  • 4
  • 48
  • 73

2 Answers2

1

Simple answer: You cannot. Precedence (and associativity) only apply to productions (on the left) and terminals (on the right). They do not apply to non-terminals.

That's not an arbitrary decision; it's inherent to the way bison handles resolution of shift/reduce conflicts. At every parsing step, the lookahead token (a terminal) must either eventually be shifted, but it is possible that there is a production which could be reduced before the terminal is shifted. If the reduction is not performed immediately, it will never be performed. An LR(1) grammar allows the parser to decide based on the current parse stack and the lookahead token whether a reduction should be performed or whether the lookahead token should be immediately shifted. If both actions are possible, then the grammar is said to have a shift/reduce conflict, and it is not strictly speaking LR(1).

Precedence and associativity rules are used to resolve shift/reduce conflicts. Productions may have an implicit or explicit precedence: the explicit precedence is provided by the %prec declaration; otherwise the precedence of the last terminal in the production is used. In the event of a shift/reduce conflict, the precedence of the production which could be reduced is compared to the precedence of the lookahead terminal which could be shifted, and whichever has the greater precedence wins. That's it. The precedence is not retained or inherited. In fact, it is inaccurate to say that the precedences are compared, since that doesn't happen during the parse; the parser has an action or transition table which defines what to do in the case of a particular stack configuration ("state") and lookahead token, and the precedence information is used at parser-generation time to fill in the entries in the action table which would otherwise be ambiguous.

In the case of your production

binary_op: '|' %prec BINARY_OP

the %prec declaration is useless, because binary_op must always be reduced immediately; it cannot participate in a shift/reduce conflict. The shift/reduce conflict comes with the non_keyword_expression production, which is marked with a (different) %prec declaration, and that is the declaration which will be used for that production.

The production for non_keyword_expression does not have a terminal, so it has no implicit precedence either. That's generally not what you want, and the use of productions like:

binary_op: '|' | "OR" ;

are not really compatible with the use of precedence to resolve parsing conflicts.


Note 1: This is not completely true if you ask for a GLR parser. The GLR parser can perform both the shift and the reduce, because it (effectively) maintains a number of parser states at the same time. Eventually, all but one of these states must be eliminated; otherwise, the parse is ambiguous. GLR parsers use precedence (and %prec declarations) in precisely the same way that non-GLR parsers do; a parser action eliminated by precedence is really eliminated and does not lead to parallel states. However, a GLR parser can also deal with reduce/reduce conflicts, in which there are two possible reductions (possibly to the same non-terminal). These conflicts can be resolved using %dprec ("dynamic precedence") declarations.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the answer. Are you are saying '|' in my grammar isn't a terminal? Confused why it would work properly for "OR" (token BINARY_OP) and not '|' (implied token '|') The associativity for BINARY_OP seems to be successfully attached to non-terminal binary_op... – nielsbot May 16 '14 at 21:13
  • @nielsbot: Sorry, I wasn't very clear. I'll edit my answer. One question, though: are you using a GLR-parser? (If so, you should mention it in your question, because it is neither obvious nor common.) – rici May 16 '14 at 22:48
  • actually I think I tagged the question with GLR, but I should mention it in the text – nielsbot May 16 '14 at 23:24
  • I guess I don't understand how to tell my parser to prefer `(non_keyword_expr binary_op non_keyword_expr) binary_op non_keyword_expr` over `non_keyword_expr binary_op (non_keyword_expr binary_op non_keyword_expr)` That's the ambiguity it's reporting. It works if the binary_op token is "OR" but not when the token is '|'. – nielsbot May 16 '14 at 23:34
  • @nielsbot: I'd need to see a lot more of your grammar to understand that issue, but you can easily solve the problem you mention either by using two productions (`non_keyword_expr BINARY_OP non_keyword_expr` and `non_keyword_expr '|' non_keyword_expr`), in which case the operator precedence should work, or by using two non-terminals, in the traditional `expr/factor/term` style. – rici May 16 '14 at 23:39
  • -1 since i don't see an answer here: once the parser reaches the binary_op nonterminal it will see the lookahead which is either '|' or the one terminal prefix of the BINARY_OP non-terminal (i.e 'OR' or 'AND', etc), in both cases of which a LR(1) parser could make the reduction first and then shift and reduce the BINARY_OP. the question is how to do this in bison - *it is a theorectically valid question for an LR(1) parser*, as far as i can see. – user3125280 Aug 23 '14 at 01:56
  • to be more specific " the use of productions like: `binary_op: '|' | "OR" ;` are not really compatible with the use of precedence to resolve parsing conflicts." seems like a bison constraint, not a LR(1) constraint. – user3125280 Aug 23 '14 at 02:01
  • @user3125280: precedence is not a concept from LR(1) grammars. It is part of the bison parsing algorithm. So you're 100% correct, it is a bison constraint, and that's what I said in the answer: "it's inherent to the way **bison** handles resolution of shift/reduce conflicts". The whole answer is about bison, in fact, but since the question includes `%prec` and `%dprec` tokens, which are bison-specific, I don't see why that should be cause for concern. – rici Aug 23 '14 at 02:14
  • @rici (i know you know what you're talking about) the answer reads like a LR(1) parser can't do this, but of course they can. precedence (obviously not a part of LR) allows to specify these kinds reductions neatly. bison's precedence is lacking here, and i feel it is an arbitrary decision. if the bison implementation of precednece was based on terminal prefixes of the following possible productions, rather than terminals themselves, this would be fixed. maybe i was unfair, but it's onyl 2 rep – user3125280 Aug 23 '14 at 02:31
  • @user3125280: I'm sure the bison maintainers will accept your patch. – rici Aug 23 '14 at 02:37
  • @rici okay, im sorry, add "in theory.." to everything i said, because obviously it's not that simple. i'm just saying OP's expectation is not unreasonable – user3125280 Aug 23 '14 at 02:41
1

Bison's precedence for rules works by comparing the rule's precedence to the precedence of all conflicting tokens to shift, when resolving an s/r conflict. So it compares BINARY_SEND_PREC to the precedence of '|' and 'OR'. For 'OR' it picks the reduce. To get the reduce for '|' as well the token '|' itself would need to be %left '|'. for them to work together '|' and 'OR' need the same precedence.

There is a workaround for this kind of problem if you can specify the associativty of terminals 'OR' and '|', etc and set their precedence the same. With a couple changes the infix calculator example can parse inputs like this:

2 PLUS -3 TIMES 4 ^ 2 + 3

-43

The major changes look like this:

%token PLUS
%token TAKE
%left '-' '+' PLUS TAKE

... 

add:      '+' | PLUS;
exp:      NUM                           { $$ = $1;         }
        | exp add exp        %prec '+'  { $$ = $1 + $3;    }

Precedence on non-terminals would be a useful extension to bison IMHO. It would allow the user to fix s/r conflicts by favouring the reduction when the a prefix of the non-terminal could be shifted (and only when it could be shifted for a non-terminal with precedence - there may be other valid reasons to shift). In fact I found this question after trying to implement a haskell style function application syntax ie

 x y z -> ((x y) z)

but because a single terminal alone is also valid here, reducing an x/y/z to the non-terminal is valid. Therfore bison would reach non-term_x non-term_y | z (| is the stack/lookahead boundary) and not know whether to reduce to non-term_x_y or shift z. (A similar trick works here fortunately)

I dug around in the bison source a little, but I can't see a simple way to allow %prec on non-terminals. When s/r conficts are resolved, only the reducing rule is known and the conflicting token to shift, the precedences of which are compared. You would need to know all valid shifiting reasons here, and there are ways to access the confliciting shift rules, so maybe.. you would need to divide the shiftable tokens into groups corresponding to the rules they would eventually reduce by, and then compare rules' precedences. Somethin I'll look at someday...

user3125280
  • 2,779
  • 1
  • 14
  • 23