2

So I have a dataframe in pandas that consists of a column with Components, and a row with Constraints. These constraints decide in what category the components have to be filtered. Now these constraints are not very straight-forward, so I'm looking for a way to split them into multiple smaller, more readable constraints. So for example if a constraint is 'A and (B or C)', I want to split up into two rows 'A and B' and 'A and C'. Not all constraints are as easy as this example though.

Here's what a small selection of the dataframe might look like:

Componenent Constraint
123 A and (B or C)
456 ((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'
789 LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))

or

import pandas as pd
dataex = {'Component': [123, 
                        456, 
                        789], 
          'Constraint': ["A and (B or C)", 
                         "((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'", 
                         "LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))"]}
df_example = pd.DataFrame(data=dataex)

Like I said, I'm hoping to split all these into multiple rows (if needed), depending on the and's and or's and parenthesis in the constraint. So I have the following result in mind:

Component Constraint
123 A and B
123 A and C
456 STEERWHEEL_LOCK='NO' and MIRROR='ELECTRIC' and MIRRORCAMERA!='NO'
456 STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_RIGHT!='NO'
456 STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_LEFT!='NO'
789 LENGTH='122'
789 LENGTH='135' and BATTERY='551'
789 LENGTH='149'
789 LENGTH='181' and BATTERY='674'
789 LENGTH='181' and BATTERY='551' and CHARGER!='NO'

or

import pandas as pd
datares = {'Component':[123, 123, 456, 456, 456, 789, 789, 789, 789, 789],
           'Constraint':["A and B",
                         "A and C",
                         "STEERWHEEL_LOCK='NO' and MIRROR='ELECTRIC' and MIRRORCAMERA!='NO'",
                         "STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_RIGHT!='NO'",
                         "STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_LEFT!='NO'",
                         "LENGTH='122'",
                         "LENGTH='135' and BATTERY='551'",
                         "LENGTH='149'",
                         "LENGTH='181' and BATTERY='674'",
                         "LENGTH='181' and BATTERY='551' and CHARGER!='NO'"
                        ]}
df_result = pd.DataFrame(data=datares)

I've tried splitting the constraints on 'or' and dividing them into arrays and then looping over them to get the result, but with some of the more difficult constraints, you get arrays inside arrays inside arrays and then it gets very messy after a while. I've also tried making a sort of logic tree, but I haven't gotten that to work in Python yet.

I'm hoping some of you might have a good idea or module to help my with my problem. Thanks!

Timus
  • 10,974
  • 5
  • 14
  • 28
  • 2
    You will need to write a parser, detect the `(`/`)` opening/closing, use the `and`/`or` keywords, etc. to build your output while reading each string. There is no straightforward and robust pandas way to do this. Especially if you want to generalize to nested conditions. – mozway May 11 '23 at 06:35
  • There's a parser [here](https://stackoverflow.com/questions/2467590/dynamically-evaluating-simple-boolean-logic-in-python) you might be able to adapt – Nick May 11 '23 at 07:19

1 Answers1

0

From your description I think you need to put the expression in "disjunctive normal form" (DNF), which looks like Or(And(v1,v2), And(v1,v4), ...). The code below will do this using an extra package (DNF is common in electronics design automation). To install the package do pip3 install pyeda.

The code that splits an expression into the corresponding And expressions is below.

Notes:

  • the regular expressions matching the filter might need adjustments if you have more than numbers/letters in the filters (like question mark, etc.)
  • you have no example on how to negate a standalone variable. I used "not(A)"
  • if you would not have standalone variables (so only comparisons), the code would be much simpler
  • if you have more operators (like less than, etc.) the code will slightly more complex

Overall idea:

  • transform your expression into a boolean expression of the form (x & y) | z . This is done by making all comparisons a variable.
  • transform the boolean expression into DNF
  • replace back the variables with the original comparisons.
from pyeda.inter import *
import re
import pandas as pd
dataex = {'Component': [123, 
                        456, 
                        789], 
          'Constraint': ["A and (B or C)", 
                         "((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'", 
                         "LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))"]}
df_example = pd.DataFrame(data=dataex)

def transform_expr(input_expression):

    def move_not_before(x):
        if '!=' in x.group(1):
            return '~'+x.group(1).replace("!=","=")
        else:
            return x.group(1)

    expr1 = (re.sub("([a-zA-Z0-9'_]*!=[a-zA-Z0-9'_]*)", move_not_before,input_expression))

    variables = {}
    values = {}
    current_key = 0
    expr2 = ""
    last_idx = 0
    # Make transformations to reach a boolean form
    for idx in re.finditer("([a-zA-Z0-9'_]*=[a-zA-Z0-9'_]*)", expr1):
        expr2 += expr1[last_idx:idx.span(1)[0]]
        if idx[1] in values:
            expr2 += values[idx[1]]
        else:
            expr2 += f'v{current_key}'
            variables[f'v{current_key}']=idx[1]
            values[idx[1]] = f'v{current_key}'
            current_key+=1
        last_idx = idx.span(1)[1]
    expr2 += expr1[last_idx:]

    expr3 = re.sub("and", "&", expr2)
    expr4 = re.sub("or", "|", expr3)
    expr5 = expr(expr4).to_dnf()

    result = []
    # We know expr5 is like Or(And(...),And(...)...), xs has the children
    for v in expr5.xs:
        # We remove the And(...)
        if "," in str(v):
            arr = str(v)[4:-1].replace(" ","").split(",")
        else:
            # For cases in which you have only one variable
            arr = [str(v)]
        r = []
        for x in arr:
            if x[0]=="~":
                variable_name = x[1:]
            else:
                variable_name = x

            if variable_name not in variables:
                # How do we negate a standalone variable?
                if x[0]=="~":
                    variable_value = f"not({variable_name})"
                else:
                    variable_value = variable_name
            else:
                variable_value = variables[variable_name]

            if x[0]=="~":
                r.append(variable_value.replace("=","!="))
            else:
                r.append(variable_value)

        result.append(" and ".join(r))

    return result

df_example['Constraint'] = df_example['Constraint'].map(transform_expr)

df_result = df_example.explode('Constraint')

pd.set_option("max_colwidth", None)
print(df_result)

And it will print:

   Component                                                         Constraint
0        123                                                            A and B
0        123                                                            A and C
1        456  MIRROR='ELECTRIC' and MIRRORCAMERA!='NO' and STEERWHEEL_LOCK='NO'
1        456    MIRROR='MANUAL' and MIRROR_RIGHT!='NO' and STEERWHEEL_LOCK='NO'
1        456     MIRROR='MANUAL' and MIRROR_LEFT!='NO' and STEERWHEEL_LOCK='NO'
2        789                                                       LENGTH='122'
2        789                                                       LENGTH='149'
2        789                                     LENGTH='135' and BATTERY='551'
2        789                                     LENGTH='181' and BATTERY='674'
2        789                   BATTERY='551' and LENGTH='181' and CHARGER!='NO'

vladmihaisima
  • 2,119
  • 16
  • 20