There is no ambiguity. That does not mean there is no parsing conflict.
An ambiguous grammar cannot be parsed with an LR(k) parser; attempting to generate the parsing automaton will necessarily fail because of the existence of two different derivations for the same sentence. But the reverse is not true. A grammar with a conflict may or may not be ambiguous. [Note 1]
To be parsed by an LR(k) grammar, every non-terminal needs to be resolved before more than k more symbols are read.
So, while you are quite correct that
LEFT_PARENTHESIS NAME RIGHT_PARENTHESIS
must be a formula
, because there is no RELATION_INFIX
.
that does not tell the whole story. For LEFT_PARENTHESIS NAME RIGHT_PARENTHESIS
to be a formula
, NAME
must also be a formula
, and the reduction of NAME
to formula
must occur within 1 symbol (because k is always one in a bison LR parser). But only reading one symbol after the NAME
, it is not possible to know that "there is no RELATION_INFIX
". Hence the reduce-reduce conflict.
If I understand your note correctly, what you have is basically an expression grammar where there are two types of expressions ("formulas" and "terms") with different operators, where identifiers can be primitive variables or the name of an expression of either type. A restriction is that the operand of a "term" operator can only be a "term", while the operand of a "formula" operator can be either a "formula" or a "term". (That is not reflected in the simplified grammar, which does not allow term AND (term AND term)
(with or without the parentheses); I'm just assuming that it is desired because otherwise it would be impossible for formulas to be complicated, but your note says that they often are.)
If a phrase includes a visible operator, the syntactic category ("formula" or "term") is evident, but a syntactic phrase consisting only of an identifier, possibly surrounded by redundant parentheses, can be either a "term" or a "formula". That makes it essentially impossible to completely express the restriction syntactically without knowing the syntactic category of each identifier.
If the syntactic category of every identifier is known before the first use of the identifier, then lexical feedback could be used to allow complete enforcement of the restriction during parsing. Otherwise, or if lexical feedback is considered undesirable, it will be necessary to validate any subexpression containing an identifier during a subsequent semantic analysis pass, perhaps the same one which expands macro identifiers.
Since a semantic check needs to be made anyway, it is easy to also check that a formula
is never the operand to a term
operator. In that case an ordinary expression grammar can be used, including normal use of precedence declarations to simplify the presentation. This grammar will directly produce the correct AST for any correct input; the semantic check is only required to reject incorrect inputs. Doing the detection in this way also helps produce informative error messages. There is really very little down-side to this solution, IMHO.
But it is also reasonably straight-forward to write a grammar which expresses the syntactic restriction for cases other than names. The trick is to use three expression types, since there really are three syntactic categories, as shown by the above discussion:
evident formulas
evident terms
phrases which might be either formulas or terms (i.e. possibly parenthesized names)
Getting parenthesized subexpressions right is a bit annoying and adds clutter to the grammar. To my mind, there is no real benefit. But if you want to try this, here's a simple example, using exactly one term operator and one formula operator and sample reduction actions which insert explicit check/conversion operations into the AST:
formula: formula_not_term
| term_not_name { $$ = mkform("TERM->FORM", $1, NULL); }
| name { $$ = mkform("NAME->FORM", $1, NULL); }
formula_not_term
: formula FORMULA_OPERATOR formula
{ $$ = mkform("FORM_OP", $1, $3); }
| '(' formula_not_term ')'
{ $$ = $2; }
term : term_not_name
| name { $$ = mkterm("NAME->TERM", $1, NULL); }
term_not_name
: term TERM_OPERATOR term
{ $$ = mkterm("TERM_OP", $1, $3); }
| '(' term_not_name ')'
{ $$ = $2; }
name : NAME
| '(' name ')' { $$ = $2; }
Notes
- Determining whether a grammar ambiguous is undecidable, which means that no algorithm exists which will definitely answer the question for any grammar. If all ambiguous grammars produced LR(1) parsing conflicts, you could use the LR(1) parser construction to detect ambiguity, which would contradict the undecidability. So fnding unambiguous grammars which produce parsing conflicts is to be expected.