5

I'm trying to parse expressions made of the binary operator +, the unary operator not and identifiers that can be any alphabetical string that isn't not

from pyparsing import (
    CaselessKeyword,
    Combine,
    Word,
    alphas,
    opAssoc,
    infixNotation,
)

identifier = Combine(~CaselessKeyword('not') + Word(alphas))
expression = infixNotation(identifier, [
  ('+', 2, opAssoc.LEFT),
  (CaselessKeyword('not'), 1, opAssoc.RIGHT),
]

Running

expression.parseString('a + (not b)')

gives what I expect

[['a', '+', ['not', 'b']]]

However, without the parentheses

expression.parseString('a + not b')

I only get the first token:

['a']

How can I define the language to work as I would like without the parentheses?

(In the real case there are more operators and reserved words: this is a step towards parsing the S3 Select language)

Michal Charemza
  • 25,940
  • 14
  • 98
  • 165
  • 'not' should have higher precedence than '+' - just reverse the order of the two lines in the list you are passing to `infixNotation`. – PaulMcG Aug 24 '20 at 14:06
  • @PaulMcG I think that causes the language to be different? For example, `'not a + b'` gets parsed differently depending on the precedence order. Note: what I'm actually doing is trying to parse the S3 Select language, where `NOT` is listed as a lower than the binary `+` https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-operators.html – Michal Charemza Aug 25 '20 at 07:38
  • Do you insist on the use of pyparsing or would a different parsing framework be acceptable? – rici Aug 31 '20 at 19:20
  • @rici Another parsing framework would be acceptable – Michal Charemza Sep 01 '20 at 20:12

1 Answers1

5

In S3 NOT is higher that AND:

Operator Precedence The following table shows the operators' precedence in decreasing order.

(from S3 amazon site).

In that table NOT is above AND.

So your code should be:

identifier = Combine(~CaselessKeyword('not') + Word(alphas))
expression = infixNotation(identifier, [
    (CaselessKeyword('not'), 1, opAssoc.RIGHT),
    ('+', 2, opAssoc.LEFT),
])

BTW - If "NOT is listed as a lower than the binary +", than a + not b is not valid expression. + needs two operators: one is a, but not b is not valid operand.

BTW2 (from comments): Please don't mix + which is an arithmetic operator and NOT which is a logic operator in the same expression. 1 + not 2 is not a valid expression. Every language decide how to parse that's kinds of strange expressions.

zvi
  • 3,677
  • 2
  • 30
  • 48
  • 1
    "In S3 `NOT` is higher that `+`" the table is in decreasing order, so the operators further down the table are lower than those higher, so `NOT` is _lower_ than `+` in S3? Also, how is `AND` involved? And... why would `not b` not be a valid operand to `+` in `a + not b`? Instinctively to me somehow, it has to be equivalent to `a + (not b)`... – Michal Charemza Aug 31 '20 at 08:22
  • You can't mix `+` which is arithmetic operation and `NOT` which is logic is same expression. `1 + not 2` is not valid. I think this is the misunderstand here. (I edit my typo in my answer). – zvi Aug 31 '20 at 08:30
  • Ah: so understood that the expression may not make sense in terms of types. However, that should be a phase after parsing of the expression? In several languages, you _can_ apply `NOT` to a number, e.g. in Python `not 3 + 2` is fine. Note also, swapping the order of precedence makes `not 3 + 2` get parsed differently, so it feels, well, wrong, if trying to parse a language specified to have the original order? – Michal Charemza Aug 31 '20 at 08:47
  • In python you can write `not 2` but it has different meaning: `2` is parse as `True` so `not True` is `False`, that's why `not 0` is `True`. Every language decide how to parse that's kinds of expressions. – zvi Aug 31 '20 at 09:07
  • The best way I have found to address BTW2 is to break the infixNotation expression up into an arithmetic_expr (using just arithmetic operators), a comparison_expr (using `=`, `!=`, `>` etc.) operators, and logical_expr (using just NOT, AND, and OR). Doing this work in the BNF up front will also help. – PaulMcG Aug 31 '20 at 20:17