Faster method of evaluating a boolean expression as a string in Python

Question

I have been working on this project for a couple months right now. The ultimate goal of this project is to evaluate an entire digital logic circuit similar to functional testing; just to give a scope of the problem. The topic I created here deals with the issue I'm having with performance of analyzing a boolean expression. For any gate inside a digital circuit, it has an output expression in terms of the global inputs. EX: ((A&B)|(C&D)^E). What I want to do with this expression is then calculate all possible outcomes and determine how much influence each input has on the outcome.

The fastest way that I have found was by building a truth table as a matrix and looking at certain rows (won't go into specifics of that algorithm as it's offtopic), the problem with that is once the number of unique inputs goes above 26-27 (something around that) the memory usage is well beyond 16GB (Max my computer has). You might say "Buy more RAM", but as every increase in inputs by 1, memory usage doubles. Some of the expressions I analyze are well over 200 unique inputs...

The method I use right now uses the compile method to take the expression as the string. Then I create an array with all of the inputs found from the compile method. Then I generate a list row by row of "True" and "False" randomly chosen from a sample of possible values (that way it will be equivalent to rows in a truth table if the sample size is the same size as the range and it will allow me to limit the sample size when things get too long to calculate). These values are then zipped with the input names and used to evaluate the expression. This will give the initial result, after that I go column by column in the random boolean list and flip the boolean then zip it with the inputs again and evaluate it again to determine if the result changed.

So my question is this: Is there a faster way? I have included the code that performs the work. I have tried regular expressions to find and replace but it is always slower (from what I've seen). Take into account that the inner for loop will run N times where N is the number of unique inputs. The outside for loop I limit to run 2^15 if N > 15. So this turns into eval being executed Min(2^N, 2^15) * (1 + N)...

As an update to clarify what I am asking exactly (Sorry for any confusion). The algorithm/logic for calculating what I need is not the issue. I am asking for an alternative to the python built-in 'eval' that will perform the same thing faster. (take a string in the format of a boolean expression, replace the variables in the string with the values in the dictionary and then evaluate the string).

#value is expression as string
comp = compile(value.strip(), '-', 'eval')
inputs = comp.co_names
control = [0]*len(inputs)

#Sequences of random boolean values to be used
random_list = gen_rand_bits(len(inputs))


for row in random_list:
    valuedict = dict(zip(inputs, row))
    answer = eval(comp, valuedict)

    for column in range(len(row)):
        row[column] = ~row[column]

        newvaluedict = dict(zip(inputs, row))
        newanswer = eval(comp, newvaluedict)

        row[column] = ~row[column]

        if answer != newanswer:
            control[column] = control[column] + 1

I recommend not to use `eval()` you can read about it [here](http://stackoverflow.com/a/1087625/1982962). — Kobi K, Dec 23 '13 at 08:17
"how much influence each input has on the outcome" - could you clarify what you mean by this part? Depending on what you want, your problem may be [NP-hard](http://en.wikipedia.org/wiki/NP-hard), but it may still be feasible. — user2357112, Dec 23 '13 at 08:17
I really hope you are not falling for the old [X Y Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Please read that and make sure this is really what you want help with, and not instead figuring out a better way to implement your idea from the get-go, not just how to implement the solution you thought of. — Inbar Rose, Dec 23 '13 at 08:19
@K DawG I know and since he is trying to use it for now at a harmless script he should be aware of the risks so when he'll create another script he will know if he can use it or not, I just recommended it :) — Kobi K, Dec 23 '13 at 08:25
@user2357112 influence is determined by how many outcomes that single input can change the resulting output. for instance, hold all other inputs to a fixed value then check the input currently looking at by switching its value, if the whole output changes then this input has control for this case. Repeat for all possible outcomes to determine the ratio of control to total outcomes. EX A&B 4 possible combinations (00, 01, 10, 11), A affects the output 2 out of those 4 combinations (when B==1). And this is an NP-Complete problem. — Jeffb, Dec 23 '13 at 08:50
@ Kobi K. I am aware of the security issue with eval, but that is what I am trying to find a better method for. (although my reason is for performance than security). — Jeffb, Dec 23 '13 at 08:51
@InbarRose No My method I described is implemented and working fine, I am trying to find a better method that does not use eval and performs faster. I just wanted to give a background of my attempts at the problem before and a scope of the project as a whole. Right now my method can calculate the influence of all inputs in an expression relatively fast if only looking at 1 expression. The extreme side of an expression i have seen was around 250 unique inputs and it took I think about 2-3 hours to evaluate. This might not seem bad but when I want to evaluate 20,000 expressions..... — Jeffb, Dec 23 '13 at 08:56
@Jeffb are you aware that changing one input a time can't give you stable results? for example if starting (randomly) from a and b falses with a&b&c, no matter which value has c, you can't change result with only a or only b. — alko, Dec 23 '13 at 09:52
if you're satisfied with logic, you can slightly improve your code with not recontructing zip/dict each time, but using `valuedict[inputs[column]] = not valuedict[inputs[column]]` — alko, Dec 23 '13 at 09:57
Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression? — poke, Dec 23 '13 at 10:18
@poke That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python 'eval' built-in to perform evaluating. — Jeffb, Dec 23 '13 at 10:29
@alko Thanks again for that code, just that small change improved runtime on a test circuit from 23.6 seconds to 14.3 seconds. Inefficient uses of python code seems to be my problem. I wonder if this speed up will scale linearly since i was unnecessarily creating a new dictionary that many times... — Jeffb, Dec 23 '13 at 10:44
probably worth reading http://stackoverflow.com/questions/14902141/any-good-boolean-expression-simplifiers-out-there — alko, Dec 23 '13 at 14:52
Binary Decision Diagrams are relevant here. They're a canonical representation of binary logic, that have the neat effect of totally reducing out the `B` from `A | (A & B)`. You can also use them to find the influence factor of different variables, by permuting the variable order to find the minimum graph. — pidge, Apr 06 '18 at 04:10

score 5 · Answer 1 · edited Mar 20 '17 at 10:29

My question:

Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression?

OP answered:

That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python eval built-in to perform evaluating.

So, this seems to be a classic XY problem. You have an actual problem which is to determine the relative influence of each variable within the a boolean expression. You have attempted to solve this in a rather ineffective way, and now that you actually “feel” the inefficiency (in both memory usage and run time), you look for ways to improve your solution instead of looking for better ways to solve your original problem.

In any way, let’s first look at how you are trying to solve this. I’m not exactly sure what gen_rand_bits is supposed to do, so I can’t really take that into account. But still, you are essentially trying out every possible combination of variable assignments and see if flipping the value for a single variable changes the outcome of the formula result. “Luckily”, these are just boolean variables, so you are “only” looking at 2^N possible combinations. This means you have exponential run time. Now, O(2^N) algorithms are in theory very very bad, while in practice it’s often somewhat okay to use them (because most have an acceptable average case and execute fast enough). However, being an exhaustive algorithm, you actually have to look at every single combination and can’t shortcut. Plus the compilation and value evaluation using Python’s eval is apparently not so fast to make the inefficient algorithm acceptable.

So, we should look for a different solution. When looking at your solution, one might say that more efficient is not really possible, but when looking at the original problem, we can argue otherwise.

You essentially want to do things similar to what compilers do as static analysis. You want to look at the source code and analyze it just from there without having to actually evaluate that. As the language you are analyzing is highly restricted (being only a boolean expression with very few operators), this isn’t really that hard.

Code analysis usually works on the abstract syntax tree (or an augmented version of that). Python offers code analysis and abstract syntax tree generation with its ast module. We can use this to parse the expression and get the AST. Then based on the tree, we can analyze how relevant each part of an expression is for the whole.

Now, evaluating the relevance of each variable can get quite complicated, but you can do it all by analyzing the syntax tree. I will show you a simple evaluation that supports all boolean operators but will not further check the semantic influence of expressions:

import ast

class ExpressionEvaluator:
    def __init__ (self, rawExpression):
        self.raw = rawExpression
        self.ast = ast.parse(rawExpression)

    def run (self):
        return self.evaluate(self.ast.body[0])

    def evaluate (self, expr):
        if isinstance(expr, ast.Expr):
            return self.evaluate(expr.value)
        elif isinstance(expr, ast.Name):
            return self.evaluateName(expr)
        elif isinstance(expr, ast.UnaryOp):
            if isinstance(expr.op, ast.Invert):
                return self.evaluateInvert(expr)
            else:
                raise Exception('Unknown unary operation {}'.format(expr.op))
        elif isinstance(expr, ast.BinOp):
            if isinstance(expr.op, ast.BitOr):
                return self.evaluateBitOr(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitAnd):
                return self.evaluateBitAnd(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitXor):
                return self.evaluateBitXor(expr.left, expr.right)
            else:
                raise Exception('Unknown binary operation {}'.format(expr.op))
        else:
            raise Exception('Unknown expression {}'.format(expr))

    def evaluateName (self, expr):
        return { expr.id: 1 }

    def evaluateInvert (self, expr):
        return self.evaluate(expr.operand)

    def evaluateBitOr (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitAnd (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitXor (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def join (self, a, ratioA, b, ratioB):
        d = { k: v * ratioA for k, v in a.items() }
        for k, v in b.items():
            if k in d:
                d[k] += v * ratioB
            else:
                d[k] = v * ratioB
        return d

expr = '((A&B)|(C&D)^~E)'
ee = ExpressionEvaluator(expr)
print(ee.run())
# > {'A': 0.25, 'C': 0.125, 'B': 0.25, 'E': 0.25, 'D': 0.125}

This implementation will essentially generate a plain AST for the given expression and the recursively walk through the tree and evaluate the different operators. The big evaluate method just delegates the work to the type specific methods below; it’s similar to what ast.NodeVisitor does except that we return the analyzation results from each node here. One could augment the nodes instead of returning it instead though.

In this case, the evaluation is just based on ocurrence in the expression. I don’t explicitely check for semantic effects. So for an expression A | (A & B), I get {'A': 0.75, 'B': 0.25}, although one could argue that semantically B has no relevance at all to the result (making it {'A': 1} instead). This is however something I’ll leave for you. As of now, every binary operation is handled identically (each operand getting a relevance of 50%), but that can be of course adjusted to introduce some semantic rules.

In any way, it will not be necessary to actually test variable assignments.

I had thought of other ways of solving this problem but it always a lead to a dead end (non-functional code); hence the reason I stuck with the expression analysis. Your response is very insightful though, I will definitely try to modify and implement it to give 'correct' values - EX: expressions such as `A | (A & B)` and `A & (A | B)` etc. like you mention. From those expressions it isn't hard to see `{'A': 1}`, but an expression such as `(A & B) | (A & D)` would be a little more difficult to arrive at `{'A': 0.75, 'B': 0.25, 'D': 0.25}`. I'm not sure how I could program that in though. — Jeffb, Dec 23 '13 at 18:13
One solution I had thought of previously was actually similar to your answer. I would start at a base expression and build up; to find the new value of A, for example, I would take A from the previous expression and multiply it by the probability the other input side is non-affecting. Take `(A|B)&(C|D)` I would multiply `A * prob( (C|D) == 1)` since `(C|D)` has to be 1 for A to be able to change the output. To handle A appearing on the other side `(A|B) & (A|C)` I came up with `left(A) * prob( (A|D) == 1) + right(A) * prob( (A|B) == 1)`. This does not work for all cases, ie `(A|B|C) & (A|C)` — Jeffb, Dec 23 '13 at 18:39

Abhijit · Answer 2 · 2013-12-23T08:29:17.983

0

Instead of reinventing the wheel and getting into risk like performance and security which you are already in, it is better to search for industry ready well accepted libraries.

Logic Module of sympy would do the exact thing that you want to achieve without resorting to evil ohh I meant eval. More importantly, as the boolean expression is not a string you don;t have to care about parsing the expression which generally turns out to be the bottleneck.

edited Dec 23 '13 at 08:29

answered Dec 23 '13 at 08:22

Abhijit

62,056
18
131
204

Sorry I forgot to mention I have already tried sympy and it was much slower. My boolean expression is stored as a string in the format of 'A&B|(C|D)' etc. where I then plug in 'True'/'False' combinations for unique inputs (A,B,C,D etc) and evaluate it. If I remember correctly, sympy did not handle xor '^' as well. – Jeffb Dec 23 '13 at 09:01

Timothy · Answer 3 · 2013-12-23T10:09:16.820

0

You don't have to prepare a static table for computing this. Python is a dynamic language, thus it's able to interpret and run a code by itself during runtime.

In you case, I would suggest a soluation that:

import random, re, time

#Step 1: Input your expression as a string
logic_exp = "A|B&(C|D)&E|(F|G|H&(I&J|K|(L&M|N&O|P|Q&R|S)&T)|U&V|W&X&Y)"

#Step 2: Retrieve all the variable names.
#        You can design a rule for naming, and use regex to retrieve them.
#        Here for example, I consider all the single-cap-lettler are variables.
name_regex = re.compile(r"[A-Z]")

#Step 3: Replace each variable with its value. 
#        You could get the value with reading files or keyboard input.
#        Here for example I just use random 0 or 1.
for name in name_regex.findall(logic_exp):
    logic_exp = logic_exp.replace(name, str(random.randrange(2)))

#Step 4: Replace the operators. Python use 'and', 'or' instead of '&', '|' 
logic_exp = logic_exp.replace("&", " and ")
logic_exp = logic_exp.replace("|", " or " )    


#Step 5: interpret the expression with eval(exp) and output its value.
print "exporession =", logic_exp  
print "expression output =",eval(logic_exp)

This would be very fast and take very little memory. For a test, I run the example above with 25 input variables:

exporession = 1 or 1 and (1 or 1) and 0 or (0 or 0 or 1 and (1 and 0 or 0 or (0 and 0 or 0 and 0 or 1 or 0 and 0 or 0) and 1) or 0 and 1 or 0 and 1 and 0)
expression output= 1
computing time: 0.000158071517944 seconds

According to your comment, I see that you are computing all the possible combinations instead of the output at a given input values. If so, it would become a typical NP-complete Boolean satisfiability problem. I don't think there's any algorithm that could make it by a complexity lower than O(2^N). I suggest you to search with the keywords fast algorithm to solve SAT problem, you would find a lot of interesting things.

edited Dec 23 '13 at 10:09

answered Dec 23 '13 at 09:19

Timothy

4,467
5
28
51

I wish you solution was faster than mine. You are evaluating 1 expression for only 1 possible combination of input values. For my purposes I would evaluate that expression for all possible combinations which is 2^25 (although my algorithm limits to 2^15). Then for each combination evaluated I check all inputs by inverting their value individually and re-evaluating. a resulting code with similar runs would be `for i in range(2**15): for j in range(25): for name in name_regex.findall(exp): exp = exp.replace(name, str(random.randrange(2))) eval(exp) exp = value` – Jeffb Dec 23 '13 at 09:56
I see. If you want to calculate all the possible combinations, then it would become a NP-complete problem. I don't think there's any algorithm that could make the complexity lower than `O(2^N)`:( – Timothy Dec 23 '13 at 10:03
`eval(exp)` and `exp = value` happen every iteration of the second `for` loop: `for j in range(25)`. as it stands my program can run through your expression and determine influence of the inputs in approx. 6.7 seconds not including the time it takes to generate a sample of size 2^15 containing random non repeating sequences 25 bits in length. (which would bump runtime up to 7.7 seconds - not very efficient but that is not something I'm currently working on to improve atm) – Jeffb Dec 23 '13 at 10:09
Yes, it is NP-Complete for the most part, until it hits 15 inputs in my algorithm; then it becomes ((2**15) * N) I know there is no faster way to calculate what I am calculating in terms of the complexity, but I was hoping there was a faster alternative to using 'eval' – Jeffb Dec 23 '13 at 10:13
So far the only things I can think of to improve speed is make this massively parallel. It would not improve the runtime of 1 expression but when dealing with 20,000 expressions having 20,000 cores (each core assigned to an expression) would make things very fast (obviously lol) but very expensive. I have it implemented with multiprocessing but 4 cores is not much better than 1 core with that many expressions. – Jeffb Dec 23 '13 at 10:23
@Jeffb Yes, parallelization would be your friend. But it's really expensive. I rent 1200 cores. The bill makes a pain every single month:( – Timothy Dec 23 '13 at 10:25

Faster method of evaluating a boolean expression as a string in Python

3 Answers3