How to remove ambiguity in following grammar?
E -> E * F | F + E | F
F -> F - F | id
How to remove ambiguity in following grammar?
E -> E * F | F + E | F
F -> F - F | id
First, we need to find the ambiguity.
Consider the rules for E without F; change F to f and consider it a terminal symbol. Then the grammar
E -> E * f
E -> f + E
E -> f
is ambiguous. Consider f + f * f:
E E
| |
+-------+--+ +-+-+
| | | | | |
E * f f + E
+-+-+ |
| | | +-+-+
f + E E * f
| |
f f
We can resolve this ambiguity by forcing * or + to take precedence. Typically, * takes precedence in the order of operations, but this is totally arbitrary.
E -> f + E | A
A -> A * f | f
Now, the string f + f * f has just one parsing:
E
|
+-+-+
| | |
f + E
|
A
|
+-+-+
A * f
|
f
Now, consider our original grammar which uses F instead of f:
E -> F + E | A
A -> A * F | F
F -> F - F | id
Is this ambiguous? It is. Consider the string id - id - id.
E E
| |
A A
| |
F F
| |
+-----+----+----+ +----+----+----+
| | | | | |
F - F F - F
| | | |
+-+-+ id id +-+-+
F - F F - F
| | | |
id id id id
The ambiguity here is that - can be left-associative or right-associative. We can choose the same convention as for +:
E -> F + E | A
A -> A * F | F
F -> id - F | id
Now, we have only one parsing:
E
|
A
|
F
|
+----+----+----+
| | |
id - F
|
+--+-+
| | |
id - F
|
id
Now, is this grammar ambiguous? It is not.
That s has exactly #(+) +s, #(*) *s and #(-) -s can be taken for granted (the numbers can be zero if not present in s). That E -> A, A -> F and F -> id have to be used exactly once can be shown as follows:
If E -> A is never used, any string derived will still have E, a nonterminal, in it, and so will not be a string in the language (nothing is generated without taking E -> A at least once). Also, every string that can be generated before using E -> A has at most one E in it (you start with one E, and the only other production keeps one E) so it is never possible to use E -> A more than once. So E -> A is used exactly once for all derived strings. The demonstration works the same way for A -> F and F -> id.
That E -> F + E, A -> A * F and F -> id - F are used exactly #(+), #(*) and #(-) times, respectively, is apparent from the fact that these are the only productions that introduce their respective symbols and each introduces one instance.
If you consider the sub-grammars of our resulting grammars, we can prove they are unambiguous as follows:
F -> id - F | id
This is an unambiguous grammar for (id - )*id
. The only derivation of (id - )^kid
is to use F -> id - F
k times and then use F -> id
exactly once.
A -> A * F | F
We have already seen that F is unambiguous for the language it recognizes. By the same argument, this is an unambiguous grammar for the language F( * F)*
. The derivation of F( * F)^k
will require the use of A -> A * F
exactly k times and then the use of A -> F
. Because the language generated from F
is unambiguous and because the language for A
unambiguously separates instances of F using *, a symbol not generated by F, the grammar
A -> A * F | F
F -> id - F | id
Is also unambiguous. To complete the argument, apply the same logic to the grammar generating (F + )*A from the start symbol E.
To remove an ambiguity means that you must choose one of all possible ambiguities. This grammar is as simple as it can be, for a mathematical expression.
To make the multiplication with a higher priority than the addition and the subtraction (where the last two have the same priority, but are traditionally computed from left to right) you do that (in ABNF like syntax):
expression = addition
addition = multiplication *(("+" / "-") multiplication)
multiplication = identifier *("*" identifier)
identifier = 'a'-'z'
The idea is as follows:
*
X *(P X)
, where X
is the previous rule you have created, and P
is your operation sign.(P1 / P2 / ...)
Then for input like: a+b+c*d+e
you get this tree:
More advanced tools will get you a tree that has more than two nodes. That means that all multiplications in one addition will be in a list that you can iterate from any direction.
This grammar is easy to upgrade, and to add parentheses you can do that:
expression = addition
addition = multiplication *(("+" / "-") multiplication)
multiplication = primary *("*" primary)
primary = identifier / "(" expression ")"
identifier = 'a'-'z'
Then for input (a+b)*c
you will get this tree:
If you want to add a division, you can modify the multiplication
rule like that:
multiplication = primary *(("*" / "/") primary)
These are all detailed trees, there are trees with less details as well, often called abstract syntax trees.