It was a choice made fairly early in the development of Python, I believe. There's no theoretical impediment to allowing 3 + not 2
to be parsed as you would expect, but Python happens not to allow that.
Contrary to what appears to be a common belief here, it has nothing to do with the operator precedence algorithm. Although operator precedence is used (slightly inexactly) as a documentation aid, the actual Python algorithm uses a context-free grammar, and the context-free grammar used explicitly specifies that an expression headed by the not
operator cannot be used as the operand of an arithmetic operation nor a comparison. As a result, expressions as simple as - not False
and True == not False
also produce syntax errors.
Here's an excerpt of the grammar, from the Python reference manual. This grammar is used to generate the Python parser, so it is authoritative, even if it is not totally readable as documentation. I spaced the productions out in order to make the pattern a bit more obvious: (I also left out a lot of productions, not all of which are totally irrelevant. Consult the original if you seek more details.)
not_test: 'not' not_test | comparison
comparison: expr ( comp_op expr )*
expr: xor_expr ( '|' xor_expr )*
xor_expr: and_expr ( '^' and_expr )*
and_expr: shift_expr ( '&' shift_expr )*
shift_expr: arith_expr ( ('<<'|'>>') arith_expr )*
arith_expr: term ( ('+'|'-') term )*
term: factor ( ('*'|'@'|'/'|'%'|'//') factor)*
factor: ( '+'|'-'|'~') factor | power
power: atom_expr ['**' factor]
You can see from that grammar that none of the productions starting with comparison
can include a not_expression
. This is what defines the precedence of not
relative to comparison operators (so that not a == b
is equivalent to not (a == b)
). But that doesn't just prevent not a
from being misparsed as an operator to ==
; it also prevents not a
from being used as a right-hand operator to ==
, which is why True == not False
is a syntax error rather than a tautology.
And since all the other operators in that excerpt bind more tightly than the comparison operators, they all have the same relationship with not
.
Some languages (like those derived from C) do not reduce the precedence of not
in this way. In C, the boolean negation operator !
has the same precedence as any other unary operator so that it binds more tightly than comparison operators (or arithmetic operators). Thus, in C, ! a < b
really does mean (!a) < b
, even though that might not be as useful. But many other languages, including most SQL dialects, place NOT
in the precedence chart at the same level as in Python: between binary boolean operators and comparison operators. [Note 2] Moreover, most SQL dialects do allow the use of NOT
as an operand to a comparison operator. And nonetheless, all of these grammars are based on the same context-free formalism. [Note 3]
So how do they do that? Writing out an unambiguous context-free grammar which allows not
to be used as a unary operator even in operands to comparison and arithmetic grammars is a challenging exercise, and reading that grammar is also non-trivial. However, a very old parsing technique makes creating a parser almost trivial. That technique is called "operator precedence", and you'll find it in almost every modern parsing framework. As we'll see, however, it is often misunderstood, and not always well implemented.
As an example, here's a simple AST builder written with the help of Sly. This is a single file, but I split it into three code blocks for readability. First, the lexer, which is not really relevant here:
from sly import Lexer, Parser
class CalcLexer(Lexer):
tokens = { NAME, NUMBER, POW, LE, GE, NE, AND, OR, NOT, TRUE, FALSE }
ignore = ' \t'
literals = { '=', '<', '>', '+', '-', '*', '/', '(', ')' }
# Multicharacter symbol tokens
LE = r'<='
GE = r'>='
NE = r'!= | <>'
POW = r'\*\*'
# Keyword tokens (and identifiers)
NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
NAME['and'] = AND
NAME['false'] = FALSE
NAME['not'] = NOT
NAME['or'] = OR
NAME['true'] = TRUE
@_(r'\d+')
def NUMBER(self, t):
t.value = int(t.value)
return t
@_(r'\n+')
def newline(self, t):
self.lineno += t.value.count('\n')
def error(self, t):
print("Illegal character '%s'" % t.value[0])
self.index += 1
Now, the parser. Note the precedence definitions, close to the top.:
class CalcParser(Parser):
tokens = CalcLexer.tokens
precedence = (
('left', OR),
('left', AND),
('right', NOT),
('left', '=', NE),
('left', '<', LE, GE, '>'),
('left', '+', '-'),
('left', '*', '/'),
('right', UMINUS),
('right', POW),
)
@_('expr')
def statement(self, p):
print(p.expr)
@_('expr OR expr')
@_('expr AND expr')
@_('expr "=" expr')
@_('expr NE expr')
@_('expr "<" expr')
@_('expr LE expr')
@_('expr GE expr')
@_('expr ">" expr')
@_('expr "+" expr')
@_('expr "-" expr')
@_('expr "*" expr')
@_('expr "/" expr')
@_('expr POW expr')
def expr(self, p):
return [p[1], p.expr0, p.expr1]
@_('"-" expr %prec UMINUS')
@_('NOT expr')
def expr(self, p):
return [p[0], p.expr]
@_('"(" expr ")"')
def expr(self, p):
return p.expr
@_('NUMBER')
@_('NAME')
@_('TRUE')
@_('FALSE')
def expr(self, p):
return p[0]
Finally, a simple driver:
if __name__ == '__main__':
try:
import readline
except:
pass
lexer = CalcLexer()
parser = CalcParser()
while True:
try:
text = input('calc > ')
except EOFError:
break
if text:
parser.parse(lexer.tokenize(text))
And a quick test, which shows that it has the (hopefully) expected behaviour:
$ python3 calc.py
calc > 3+4
['+', 3, 4]
calc > 3+not 4
['+', 3, ['not', 4]]
calc > not 3 + 4
['not', ['+', 3, 4]]
calc > not 3*4<7
['not', ['<', ['*', 3, 4], 7]]
calc > not 3*4<7 and not true
['and', ['not', ['<', ['*', 3, 4], 7]], ['not', 'true']]
Before LR parsing became practical (with Frank deRemer's efficient algorithm to construct an LALR(1) parser), it was common to use so-called "operator-precedence" parsers, which were first described in 1963 in a paper by Robert W. Floyd, Syntactic analysis and operator precedence [Note 4]. Operator precedence parsers are [shift-reduce parsers], whose shift and reduce actions are performed on the basis of a matrix of "precedence relations", indexed by the operator on the top of the parser stack and the operator which appears next in the input stream. Each entry in the matrix has one of four possible values:
- ⋖, indicating that the input symbol should be shifted.
- ⋗, indicating that the stack symbol should be reduced.
- ≐, indicating that the stack symbol and the lookahead symbol belong to the same production.
- empty, indicating a syntax error.
Despite their resemblance to comparison operators, the precedence relations described by Floyd are not even a partial ordering, because they are not transitive nor are they antisymmetric. But in most practical grammars, they can be reduced to a numerical comparison, with two major caveats:
The original matrix of relations contains empty entries, representing illegal syntax. Reducing the relations to numerical comparison gives every entry a defined value, so the resulting parser will fail to detect some syntax errors.
Because the precedence relations are not really comparisons, it is generally not possible to represent them with a single numeric mapping. You need to use two functions, the "left precedence" and the "right precedence", one used for the operator on the left and the other for the operator on the right.
In some sense, the first of these issues is not so serious, because Floyd's algorithm itself does not necessarily detect all syntax errors. In effect, operator precedence parsing erases the difference between different non-terminals; all reduction actions produce a kind of generic non-terminal, which might lose information needed to distinguish between different syntactic cases. If the information loss leads to an inability to decide between two different reductions, then the grammar cannot be parsed with operator precedence. But it is more common for the information loss to lead to simply being unable to detect syntax errors, which might be considered acceptable.
In any case, Floyd's algorithm became extremely popular. It was used to build parsers for many languages, and gave rise to some paradigms still in popular use, such as the shunting yard algorithm.
In general, computing a precedence matrix and then reducing it to a pair of functions is hard to do by hand. But in one specific area -- algebraic expressions -- the results are so intuitive that they have come to replace the original definition. If you want to parse an algebraic expression without parentheses and without unary operators, you can do so by simply grouping the operators into precedence levels. Each level is assigned two consecutive integers (and no two levels use the same integers), used as the left- and right-precedence values.
There are two integers in order to distinguish between left- and right-associative operators: for a left-associative level, the left-precedence is the greater of the two integers; for a right-associative level, the right-precedence is greater.
That's not the only solution. You'll find many others. For example, it's pretty common to see this described as a single integer accompanied by a two-state enumeration (Associativity), where the Associativity is taken into account if the two integers compared are equal. That "simplification" will not work for an operator precedence grammar in general, since parentheses and unary operators cannot be simply handwaved away from the language. But it might be acceptable if in the context that precedence rules are applied, the parsing framework has already dealt with the parentheses.
There is no problem handling parentheses with an operator precedence parser. The problem arises when you insist that the left- and right-precedence of a symbol be consecutive integers (or a single integer and a flag):
(
is always shifted onto the stack, meaning that its right precedence is higher than the left precedence of any operator.
- When
(
is at the top of the stack, no operator causes it to be reduced, meaning that its left precedence is lower than the right precedence of any operator. (It is eventually reduced by the matching )
).
Unary operators have a similar asymmetry. A prefix operator is also always shifted, since there is no preceding non-terminal to reduce. But, as the case of NOT
shows, the operator is not necessarily of higher precedence than any following operator. So its left precedence and right precedence can differ greatly.
(to be continued)
Notes:
If we just wanted to check the syntax of an expression, we could eliminate the recursive productions in not_test
and factor
, which might be a source of confusion. We could instead write:
not_test: ( 'not' )* comparison
factor: ( '+'|'-'|'~'|atom_expr '**')* atom_expr
But that would not produce a correct parse, in the sense that it does not associate operators with their arguments. It is an essential that exponentiation be applied right-to left, for example. Moreover, we cannot in general evaluate not
by somehow coalescing a sequence of not
prefix operators into a single operator.
This just goes to show how arbitrary operator precedence can be. There really is no universal standard; there are even languages in which all operators have the same precedence so that grouping is strictly right to left.
The new Python parser is actually a Parsing Expression Grammar (PEG), which is also the formalism pyparsing
implements. PEGs are radically different from context-free grammars, but in this particular little corner the differences are not important.
Outrageously, the ACM (in whose journal the article first appeared) believes that it is justifiable to charge fifteen dollars so that people can read an 18-page paper published nearly 60 years ago. Here's a link to the paywall, in case you find it useful.