1

I have strings made up of Boolean terms and equations, like so

x=1 AND (x=2 OR x=3) AND NOT (x=4 AND x=5) AND (x=5) AND y=1

I would like to break up the x into groups that were separated by AND, while respecting parentheses as grouping operators. For example the result for the string above would be

[['x=1'], ['x=2', 'x=3'], ['x=4'], ['x=5'], ['x=5']]

x=2 and x=3 are in the same group because they are grouped by () and not separated by an AND. The last equation was ignored because it does not start with x.

UPDATE

Another example is

x=1 AND (x=2 OR (x=3 AND x=4))

where each equation should be separate

[['x=1'], ['x=2', [['x=3'], ['x=4']]]

The closest I found was this post but I don't know how to modify it to my needs.

badmax
  • 645
  • 2
  • 5
  • 12
  • You're right, I left a closing `)` in the example string. Your expected result is correct but I removed one level of the list because I don't need a nested structure like that, I thought it would make things easier. – badmax Mar 31 '20 at 22:56

2 Answers2

1

I suppose you could do something like this:

operators = ["AND NOT", "AND"]
sepChar = ":"
yourInputString = yourInputString.replace("(","").replace(")","") # remove the parenthesis

# Replace your operators with the separator character
for op in operators:
    yourInputString = yourInputString.replace(op,sepChar)

# output of your string so far
# yourInputString
# 'x=1 : x=2 OR x=3 : x=4 : x=5 : x=5 : y=1'

# Create a list with the separator character
operationsList = youtInputString.split(sepChar) 

# operationsList
# ['x=1', 'x=2 OR x=3', 'x=4', 'x=5', 'x=5', 'y=1']

# For the second result, let's do another operation list:
operators2 = ["OR"]
output = []

# Loop to find the other operators
for op in operationsList:
    for operator in operators2:
        if operator in op:
            op = op.split(operator)
    output.append(op)

# output:
# [['x=1'], ['x=2', 'x=3'], ['x=4'], ['x=5'], ['x=5'],['y=1']]
        

In this case, I used ":" as separation character, but you can change it according to your needs. Please let me know if this helps!

Edit

For a parenthesis nesting approach, I came with something brilliant:

import re
operators = ["AND NOT","AND","OR"]

# Substitute parenthesis
yourInputString = yourInputString.replace("(","[").replace(")","]")

# yourInputString
# "[x=1 AND [x=2 OR x=3] AND NOT [x=4 AND x=5] AND [x=5] AND y=1]"

# Replace your operators
for op in operators:
    yourInputString = yourInputString(op,",")

# yourInputString
# "[x=1 , [x=2 , x=3] , [x=4 , x=5] , [x=5] , y=1]"

# Find matches like x = 5 and substitue with 'x = 5'
compiler = re.compile(r"[xyz]{1}=\d")
matches = compiler.findall(yourInputString)

# matches
# ['x=1', 'x=2', 'x=3', 'x=4', 'x=5', 'x=5', 'y=1']

# Convert the list into unique outputs
matches = list(set(matches))

# matches
# ['x=1', 'x=2', 'x=3', 'x=4', 'x=5', 'y=1']

# Replace your matches to add quotes to each element
for match in matches:
    yourInputString = yourInputString.replace(match,f"'{match}'")


# yourInputString
# "['x=1' , ['x=2' , 'x=3'] , ['x=4' , 'x=5'] , ['x=5'] , 'y=1']"

# Here is the special move, convert your text into list
myList = eval(yourInputString)

# myList
# ['x=1', ['x=2', 'x=3'], ['x=4', 'x=5'], ['x=5'], 'y=1']
Nimantha
  • 6,405
  • 6
  • 28
  • 69
EnriqueBet
  • 1,482
  • 2
  • 15
  • 23
  • It works almost everywhere but not when there's nesting, as in `'x=1 AND (x=2 OR (x=3 AND x=4)`. In this case the parenthesis before `x=3` is not taken into account and the result is `[['x=1'], ['x=2 OR x=3'], ['x=4']],` when it should be `[['x=1'], ['x=2'], ['x=3'], ['x=4']]`. – badmax Mar 31 '20 at 19:14
  • 1
    Then another approach is needed a dynamic prog approach. Let me change that – EnriqueBet Mar 31 '20 at 19:17
  • I really like this approach, but for the nested string `'x=1 AND (x=2 OR (x=3 AND x=4)` the result is now `('x=1', ['x=2', ['x=3', 'x=4']])`. The `['x='3, 'x=4']` list should be separated since there's an `AND` between them. – badmax Mar 31 '20 at 20:11
  • 1
    Ok, let me think that over again – EnriqueBet Mar 31 '20 at 20:35
1

As you probably saw in that other question, parsing infix notation such as this is best done in pyparsing using the infixNotation helper (formerly named operatorPrecedence). Here are the basics for using infixNotation on your problem:

import pyparsing as pp

# define expressions for boolean operator keywords, and for an ident
# (which we take care not to parse an operator as an identifier)
AND, OR, NOT = map(pp.Keyword, "AND OR NOT".split())
any_keyword = AND | OR | NOT
ident = pp.ungroup(~any_keyword + pp.Char(pp.alphas))
ident.setName("ident")

# use pyparsing_common.number pre-defined expression for any numeric value
numeric_value = pp.pyparsing_common.number

# define an expression for 'x=1', 'y!=200', etc.
comparison_op = pp.oneOf("= != < > <= >=")
comparison = pp.Group(ident + comparison_op + numeric_value)
comparison.setName("comparison")

# define classes for the parsed results, where we can do further processing by
# node type later
class Node:
    oper = None
    def __init__(self, tokens):
        self.tokens = tokens[0]

    def __repr__(self):
        return "{}:{!r}".format(self.oper, self.tokens.asList())

class UnaryNode(Node):
    def __init__(self, tokens):
        super().__init__(tokens)
        del self.tokens[0]

class BinaryNode(Node):
    def __init__(self, tokens):
        super().__init__(tokens)
        del self.tokens[1::2]

class NotNode(UnaryNode):
    oper = "NOT"

class AndNode(BinaryNode):
    oper = "AND"

class OrNode(BinaryNode):
    oper = "OR"

# use infixNotation helper to define recursive expression parser,
# including handling of nesting in parentheses
expr = pp.infixNotation(comparison,
        [
            (NOT, 1, pp.opAssoc.RIGHT, NotNode),
            (AND, 2, pp.opAssoc.LEFT, AndNode),
            (OR, 2, pp.opAssoc.LEFT, OrNode),
        ])

Now try using this expr parser on a test string.

test = "x=1 AND (x=2 OR x=3 OR y=12) AND NOT (x=4 AND x=5) AND (x=6) AND y=7"

try:
    result = expr.parseString(test, parseAll=True)
    print(test)
    print(result)
except pp.ParseException as pe:
    print(pp.ParseException.explain(pe))

Gives:

x=1 AND (x=2 OR x=3 OR y=12) AND NOT (x=4 AND x=5) AND (x=6) AND y=7
[AND:[['x', '=', 1], OR:[['x', '=', 2], ['x', '=', 3], ['y', '=', 12]], NOT:[AND:[['x', '=', 4], ['x', '=', 5]]], ['x', '=', 6], ['y', '=', 7]]]

From this point, collapsing the nested AND nodes and removing non-x comparisons can be done using regular Python.

PaulMcG
  • 62,419
  • 16
  • 94
  • 130