1

Working on a small project, I stumbled across the problem that I cannot find an efficient way of parsing mathematical expressions in Python.

Briefly about the context: the program relies on iterating over a very large (~10 million) number of algebraic expressions in RPN, evaluating each to see if it matches a desired outcome. Unsurprisingly, the slowest part of the approach is evaluating each expression, hence I am keen to streamline it.

The following question is a summary of my findings so far and an invitation to suggest potential improvements to existing methods.

I have tested the following methods:

  • my own method, which is a classic stack implementation of RPN parsing
  • a slightly more generalised approach, found here
  • and an approach using list slices to get the stack items, rather than popping, found here

I have also tested the evil eval to get a reference point for Python's inbuilt parser, as well as just carrying out the calculations "in-line".

Testing was done by attempting to parse each of the following expressions 1 million times and recording the time taken by each function:

  • 2 2 +
  • 2 2 2 + +
  • 2 2 2 2 + + +

Eval was used to evaluate equivalent infix expressions and the same infix expressions were passed directly to python for the final step of testing. Full code is in the bottom of the post.

Findings are as follows: Table, times in seconds With some minor differences, the approaches are broadly similar because they are all list implementations of stacks, so the delays come from mutating lists, I imagine.

The evil eval is significantly worse in comparison. And inline evaluation is obviously lightning-fast.

I understand that parsing performs a great deal of recognition operations in addition to the calculations, but I was hoping there is an approach that is (significantly) more efficient than the ways I looked into here. Maybe there is some Python magic I don't know about... I would welcome any and all pointers from Python gurus, otherwise this post can serve as a summary of RPN parsers for future generations...

import itertools as it
import random
import time
import operator
operators = ["+", "-", "/", "*"]
count = 0


def RPN_Classic_Stack(expression): #Mine
    explist = expression.split(" ")
    explist.pop(-1)
    stack = []

    for char in explist:
        if not char in operators:
            stack.append(int(char))
        else:
            if char == "+":
                num1 = stack.pop()
                num2 = stack.pop()

                result = num1 + num2
                stack.append(result)

            if char == "-":
                num1 = stack.pop()
                num2 = stack.pop()
                result = -num1 + num2

                stack.append(result)

            if char == "*":
                num1 = stack.pop()
                num2 = stack.pop()

                result = num1 * num2
                stack.append(result)

            if char == "/":
                divisor = stack.pop()
                divident = stack.pop()

                try:
                    result = divident / divisor
                except:
                    return [-1]
                stack.append(result)

    return stack.pop()


def safe_divide(darg1, darg2):
    try:
        return darg1/darg2
    except ZeroDivisionError:
        return -1

def RPN_Generalised_Operators(expression): #https://stackoverflow.com/a/37770871/5482177
    function_twoargs = {
        '*': operator.mul,
        '/': safe_divide,
        '+': operator.add,
        '-': operator.sub,
        }
    expression = expression.split(" ")
    stack = []
    for val in expression:
        result = None
        if val in function_twoargs:
            arg2 = stack.pop()
            arg1 = stack.pop()
            result = function_twoargs[val](arg1, arg2)
        else:
            result = float(val)
        stack.append(result)
    return stack.pop()

def RPN_List_Slicing(expression): #https://stackoverflow.com/a/3866502/5482177
    operators = {
    '+':  operator.add, '-':  operator.sub,
    '*':  operator.mul, '/':  operator.truediv, '%':  operator.mod,
    '**': operator.pow, '//': operator.floordiv,
    }
    stack = []
    for val in expression.split():
        if val in operators:
            f = operators[val]
            stack[-2:] = [f(*stack[-2:])]
        else:
            stack.append(int(val))
    return stack.pop()

def not_RPN_Evil_Eval(expression):
    return eval(expression)

expressions = ["2 2 +", "2 2 2 + +", "2 2 2 2 + + +"]
results = {}
parsers = [RPN_Classic_Stack, RPN_Generalised_Operators, RPN_List_Slicing]
for expression in expressions:
    for parser in parsers:
        start = time.time()
        for i in range(0,1000000):
            parser(expression)
        end = time.time()
        results[parser.__name__] = results.get(parser.__name__, {})
        results[parser.__name__][expression] = end-start
        print(results)

non_RPN_expressions = ["2 + 2", "2 + 2 + 2", "2 + 2 + 2 + 2"]
for expression in non_RPN_expressions:
    start = time.time()
    for i in range(0,1000000):
        not_RPN_Evil_Eval(expression)
    end = time.time()
    print(expression, end-start)


############ Inline calculations ###################
start = time.time()
for i in range(0,1000000):
    2 + 2
end = time.time()
print(expression, end-start)

start = time.time()
for i in range(0,1000000):
    2 + 2 + 2
end = time.time()
print(expression, end-start)

start = time.time()
for i in range(0,1000000):
    2 + 2 + 2 + 2
end = time.time()
print(expression, end-start)
##############################
tripleee
  • 175,061
  • 34
  • 275
  • 318
IliaK
  • 19
  • 2
  • Do the expressions have to be strings? In the context of your program you're probably generating them in the first place, if you generate them as some easily evaluatable object you can skip the parsing step. – harold Jan 17 '17 at 16:16
  • Is it possible to store an evaluatable algebraic expression without giving it any values until later? This way I can make a generator that will "remember" the shape of the algebraic expression, and I can yield results from it by sending it values individually... My understanding of generators is very limited at the moment. – IliaK Jan 17 '17 at 20:46
  • Ah, I get it... lambdas. Lots of dynamically generated, horrible-horrible lambdas. Will update question once I get it working. – IliaK Jan 17 '17 at 21:26
  • Is your question about parsing or evaluating? They aren't the same thing. – user207421 Feb 11 '20 at 01:12

0 Answers0