Wanted: example(s) of PEG Grammar for deeply nested Python boolean expressions

Question

I would like an example of a PEG grammar for deeply nested Python boolean expressions. The PEG grammar can't be "very recursive" or this will lead to stack overflow. Support for '|', '&', '(' and ')' required.

Example of input: (a=1|b=1|c=1|d=1&e=2)&(f=2&g=2&h=2|i=3|j=3|k=3)

Ira Baxter · Answer 1 · 2019-03-09T17:10:38.470

To parse typical (especially Boolean logic) expressions, you hardly need a Peg parser; there's never really a need to backtrack. Pick any parsing engine and use it (including a Peg engine if you insist).

Here's a grammar that will work for your example:

EXPRESSION = DISJUNCTION ;
DISJUNCTION = CONJUNCTION ( '|' CONJUNCTION )*;
CONJUNCTION = TERM ( '&' TERM );
TERM = '~' TERM | '(' EXPRESSION ')' | CONDITION ;
CONDITION = VARIABLE '=' CONSTANT ;

If you want to implement this grammar using a hand-written, recursive descent parser that is easy to code, see my SO answer on how to do this: Is there an alternative for flex/bison that is usable on 8-bit embedded systems?

Regarding overflowing the stack: unless your expression is nested hundreds of levels ("parentheses") deep, on most windows or linux system, the available stack is far bigger than the stack demands of a parser applied to expressions that people write in practice. With huge expressions, pretty much people can't read them anyway so they don't occur. OP's example expression requires nesting of just a few stack entries.

If you write a grammar for a full-fledged programming language, and somebody writes a big, complex program in that langauge, you might risk overflowing the stack. I can tell you from experience with a compiler I built that a nesting of 256 (stack frames) fits easily in Windows 1Mb stack space, and that's enough to compile 2 million line program with every strange construct and deeply nested lexical scoping stunt known to mankind in it.

Hi Ira. Thanks for the great answer :) I actually need a PEG grammar to feed to a parser generator. What I meant by "nested" was that the grammar should ideally not create a binary tree - this leads to a very deep tree if the expression has many many ORs and ANDs. The grammar I have is right recursive (I think that's the proper term) and leads to stack overflow for very long expressions (even without parenthesis): https://github.com/kschiess/parslet/blob/master/example/boolean_algebra.rb — SoupMonster, Nov 15 '16 at 05:01
Thanks Ira. I've given up on using a PEG-grammar-to-parser-generator, because of the lack of debug support. I think I will go for your hand written approach! :) — SoupMonster, Nov 17 '16 at 10:23
@SoupMonster: (Yeah, I know this response is a "little" delayed...) Your grammar will be left or right recursive pretty much whereever you have protductions that process a list of items (e.g., AND ... and OR ...). While the parser will process the productions and perhaps left/right recurse deeply, it isn't that hard to get generate a "list" node for AND and a list node for "OR" ... so your AND/OR trees can be relatively shallow. I frankly don't understand how you can run out of stack space for any realistic expressions typed by people. — Ira Baxter, Mar 09 '19 at 17:14

score 1 · Answer 2 · answered Apr 05 '18 at 22:33

Here's a PEG grammar that supports the simple boolean expression you have above (and a little more), with the assumption that AND should bind more tightly than OR:

expr       = multiOR !.
multiOR    = multiAND (_wsp opOR _wsp multiAND)*
multiAND   = comparison (_wsp opAND _wsp comparison)*

comparison = varOrNum _wsp opCOMP _wsp varOrNum
             / '!'? var
             / '!'? '(' multiOR ')'

varOrNum   = var / num

var        = [a-z]i [a-z0-9_]i*
num        = '-'? [0-9]+ ('.' [0-9]+)?

opOR       = '|' '|'?
opAND      = '&' '&'?
opCOMP     = [<>=] '='? / [≤≥≠] / '!='

_wsp       = [ \t]*

Here you can see the result of applying this PEG to input like yours:

Obviously the comparison operators are a little overboard, and it supports both |/& and ||/&& for the boolean operators. Feel free to modify as desired.

Wanted: example(s) of PEG Grammar for deeply nested Python boolean expressions

2 Answers2