3

Imagine the following types of strings:

if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)

Now, I'd like to get the expressions in parentheses, so I wrote a PEG parser with the following grammar:

from parsimonious.grammar import Grammar

grammar = Grammar(
    r"""
    program     = if expr+
    expr        = term (operator term)*
    term        = (factor operator factor) / factor
    factor      = (lpar word operator word rpar) / (lpar expr rpar)

    if          = "if" ws
    and         = "and"
    or          = "or"
    operator    = ws? (and / or) ws?

    word        = ~"\w+"
    lpar        = "("
    rpar        = ")"

    ws          = ~"\s*"
    """)

which parses just fine with

tree = grammar.parse(string)

Now the question arises: how to write a NodeVisitor class for this tree to get only the factors? My problem here is the second branch which can be deeply nested.


I tried with
def walk(node, level = 0):
    if node.expr.name == "factor":
        print(level * "-", node.text)

    for child in node.children:
        walk(child, level + 1)

walk(tree)

but to no avail, really (factors bubble up in duplicates).
Note: This question is based on another one on StackOverflow.

sophros
  • 14,672
  • 11
  • 46
  • 75
Jan
  • 42,290
  • 8
  • 54
  • 79

2 Answers2

2

How would I go about it to get ((a1 and b) or (a2 and c)), (c and d) and (e and f) as three parts?

You could create a visitor that "listens" when a node in the parse tree is a (, in which a depth-variable is increased, and when a ) is encountered, the depth-variable is decreased. Then in the method that is called that matches a parenthesised expression, you inspect the depth before adding it to your list of expressions to return from the visitor.

Here a is a quick example:

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

grammar = Grammar(
    r"""
    program     = if expr+
    expr        = term (operator term)*
    term        = (lpar expr rpar) / word

    if          = "if" ws
    and         = "and"
    or          = "or"
    operator    = ws? (and / or) ws?

    word        = ~"\w+"
    lpar        = "("
    rpar        = ")"

    ws          = ~"\s*"
    """)


class ParExprVisitor(NodeVisitor):

    def __init__(self):
        self.depth = 0
        self.par_expr = []

    def visit_term(self, node, visited_children):
        if self.depth == 0:
            self.par_expr.append(node.text)

    def visit_lpar(self, node, visited_children):
        self.depth += 1

    def visit_rpar(self, node, visited_children):
        self.depth -= 1

    def generic_visit(self, node, visited_children):
        return self.par_expr


tree = grammar.parse("if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)")
visitor = ParExprVisitor()

for expr in visitor.visit(tree):
    print(expr)

which prints:

((a1 and b) or (a2 and c))
(c and d)
(e and f)
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks. How would I go about it to get `((a1 and b) or (a2 and c))`, `(c and d)` and `(e and f)` as three parts? Sorry for not being clear in the original question here. The main question imo is how to write a recursive `NodeVisitor` class. – Jan Apr 01 '19 at 18:44
  • 1
    Ah, I see. Let me have a quick look. – Bart Kiers Apr 01 '19 at 18:51
  • That's a clever one! I'd wait another couple of days with the bounty but that is definately interesting. – Jan Apr 01 '19 at 19:39
  • 1
    Sure, no problem, I answered because I didn't know this `parsimonious` library, which looked nice. I might have a closer look at it. – Bart Kiers Apr 01 '19 at 19:42
1

If you only want to return each outermost factor, return early and do not descend into its children.

def walk(node, level = 0):
    if node.expr.name == "factor":
        print(level * "-", node.text)
        return
    for child in node.children:
        walk(child, level + 1)

Output:

----- ((a1 and b) or (a2 and c))
----- (c and d)
------ (e and f)
Jan-Gerd
  • 1,261
  • 8
  • 8
  • Thanks for your response. However I am merely looking for a `NodeVisitor` solution or a hint to reformulate the grammar. With the early returns one can't do anything with the output. – Jan Apr 01 '19 at 18:45