8

I have a string that is a mathematical equation, but with some custom functions. I need to find all such functions and replace them with some code.

For example, I have a string:

a+b+f1(f2(x,y),x)

I want code that will replace (say) f2(x,y) with x+y^2 and f1(x,y) with sin(x+y).

It would be ideal if nested functions were supported, like in the example. However, it would still be useful if nesting was not supported.

As I understand from similar topics this can be done using a compiler module like compiler.parse(eq). How I can work with AST object created by compiler.parse(eq) to reconstruct my string back, replacing all found functions?

I need only to perform substitution and then string will be used in other program. Evaluation is not needed.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alex
  • 2,009
  • 6
  • 24
  • 27
  • in order to reconstruct your string you need to use LVR Traversal over the given AST. – Maher Jan 15 '15 at 10:13
  • Why not tokenize the string? – styvane Jan 15 '15 at 10:16
  • @Maher That is exactly the question. How can I perform LVR Traversal. I need an example. – Alex Jan 15 '15 at 11:24
  • @Styvane if I will simple tokenize the string then, i think, i will be unable to work with nested functions? – Alex Jan 15 '15 at 11:26
  • Out of curiosity: are you trying to do this for some sort of evolutionary algorithm? – inspectorG4dget Jan 15 '15 at 11:29
  • @Alex do you know every possible text sequence in your string? – styvane Jan 15 '15 at 11:35
  • @inspectorG4dget No. Just want to do some automation for measurements generation. – Alex Jan 15 '15 at 11:55
  • Are you open to representing your functions in some other format than as string? – inspectorG4dget Jan 15 '15 at 11:56
  • @Styvane I know all possible functions and mathematical mathematical operations (+-*/) but they can come in any sequence, and equations can be used as arguments of functions. Of cause now i have implementation for singe function. It is simple. I suppose I will find way to deal with multiple functions without nested functions. But I would like to find universal approach that will work without limitations. – Alex Jan 15 '15 at 12:02
  • @inspectorG4dget I think no. Humans need to be able to write equations :) I assume they will not want to create something that is not a string :) – Alex Jan 15 '15 at 12:05
  • Don't have the time to write up the solution right now, but take a look at the `parse` function [here](https://github.com/inspectorG4dget/Course-Calendar/blob/master/CourseCalendar/Scrape.py). There's something there that you can extend for your purpose – inspectorG4dget Jan 15 '15 at 15:05
  • I assume the formulas are not very big (~~1 line)? Why can't you do this trivially with a regexp find-and-substitute (maybe nesting does you in)? If you can't, I can show you a non-python answer that can do arbitrary replacements, including "nested" replacements, all based on ASTs. – Ira Baxter Jan 29 '15 at 07:01
  • What does your external program think of the `^` and `**` operators? – Eric Jan 29 '15 at 13:39
  • @IraBaxter It is difficult to deal with functions inside functions and multiple levels of brackets with regexp. Also programming is not my main job and i need relatively simple (not time consuming) solution. – Alex Feb 01 '15 at 13:04
  • @Eric ^ is xor function ** is exponent – Alex Feb 01 '15 at 13:06
  • Do you care whether the result has redundant parentheses? That is, if you give the function `a+b*x`, would you mind if you got back `(a+(b*x))` (independent of function substitution)? Or in other words, is the converted expression intended for further processing or for display to human beings? – rici Feb 01 '15 at 16:19
  • And another question: do your functions have free variables? – rici Feb 02 '15 at 00:36
  • @rici The converted expression intended for further processing, but also displayed to human beings. But redundant parentheses can be ok. – Alex Feb 04 '15 at 12:07
  • @rici what you mean by free variables? – Alex Feb 04 '15 at 12:07

6 Answers6

9

Here is a minimal working example (+, - , *, /, ** binary and unary operations and function call implemented). The priority of operations are set with parenthesis.

A little bit more than the functionality for the example given is done:

from __future__ import print_function
import ast

def transform(eq,functions):
    class EqVisitor(ast.NodeVisitor):
        def visit_BinOp(self,node):
            #generate("=>BinOp")
            generate("(")
            self.visit(node.left)
            self.visit(node.op)
            #generate("ici",str(node.op),node._fields,node._attributes)
            #generate(dir(node.op))
            self.visit(node.right)
            generate(")")
            #ast.NodeVisitor.generic_visit(self,node)
        def visit_USub(self,node):
            generate("-")
        def visit_UAdd(self,node):
            generate("+")

        def visit_Sub(self,node):
            generate("-")
        def visit_Add(self,node):
            generate("+")
        def visit_Pow(self,node):
            generate("**")
        def visit_Mult(self,node):
            generate("*")
        def visit_Div(self,node):
            generate("/")
        def visit_Name(self,node):
            generate(node.id)
        def visit_Call(self,node):
            debug("function",node.func.id)
            if node.func.id in functions:
                debug("defined function")
                func_visit(functions[node.func.id],node.args)
                return
            debug("not defined function",node.func.id)
            #generate(node._fields)
            #generate("args")
            generate(node.func.id)
            generate("(")
            sep = ""
            for arg in node.args:
                generate (sep)
                self.visit(arg)
                sep=","
            generate(")")
        def visit_Num(self,node):
            generate(node.n)
        def generic_visit(self, node):


            debug ("\n",type(node).__name__)
            debug (node._fields)
            ast.NodeVisitor.generic_visit(self, node)

    def func_visit(definition,concrete_args):
        class FuncVisitor(EqVisitor):
            def visit_arguments(self,node):
                #generate("visit arguments")
                #generate(node._fields)
                self.arguments={}
                for concrete_arg,formal_arg in zip(concrete_args,node.args):
                    #generate(formal_arg._fields)
                    self.arguments[formal_arg.id]=concrete_arg
                debug(self.arguments)
            def visit_Name(self,node):
                debug("visit Name",node.id)
                if node.id in self.arguments:
                    eqV.visit(self.arguments[node.id])
                else:
                    generate(node.id)


        funcV=FuncVisitor()
        funcV.visit(ast.parse(definition))

    eqV=EqVisitor()
    result = []
    def generate(s):
        #following line maybe usefull for debug
        debug(str(s))
        result.append(str(s))
    eqV.visit(ast.parse(eq,mode="eval"))
    return "".join(result)
def debug(*args,**kwargs):
    #print(*args,**kwargs)
    pass

Usage:

functions= {
    "f1":"def f1(x,y):return x+y**2",
    "f2":"def f2(x,y):return sin(x+y)",
}
eq="-(a+b)+f1(f2(+x,y),z)*4/365.12-h"
print(transform(eq,functions))

Result

((-(a+b)+(((sin((+x+y))+(z**2))*4)/365.12))-h)

WARNING

The code works with Python 2.7 and as it is AST dependent is not guaranteed to work with another version of Python. The Python 3 version doesn't work.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Xavier Combelle
  • 10,968
  • 5
  • 28
  • 52
  • 1
    Seems to work fine. Currently the only minor problem is redundant parentheses, but it still ok. Thank you. – Alex Feb 04 '15 at 12:05
  • To remove the redundant parentheses you have to take account the operator priority. I prefered not to do. The expression is correnct even if it could be simplifyrd – Xavier Combelle Feb 04 '15 at 13:29
3

Do you know the variables beforehand?

I recommend using SymPy!

Take for example the following:

import sympy

a,b,x,y = sympy.symbols('a b x y')
f1 = sympy.Function('f1')
f2 = sympy.Function('f2')

readString = "a+b+f1(f2(x,y),x)"

z = eval(readString)

'z' will now be a symbolic term representing the mathematical formula. You can print it out. You can then use subs to replace symbolic terms or functions. You can either represent sine symbolically again (like f1 and f2) or you can possibly use the sin() in sympy.mpmath.

Depending on your needs, this approach is great because you can eventually compute, evaluate or simplify this expression.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
adrianX
  • 619
  • 7
  • 21
3

The full substitution is quite tricky. Here is my attempt to do it. Here we can successfully inline expressions, but not in all scenarios. This code works on AST only, made by ast module. And uses codegen to stringify it back to code. The stringifying of ast and modifying ast in general is covered in other SO Q/A: "Parse a .py file, read the AST, modify it, then write back the modified source code".

First we define few helpers:

import ast
import codegen
import copy

def parseExpr(expr):
    # Strip:
    # Module(body=[Expr(value=
    return ast.parse(expr).body[0].value

def toSource(expr):
    return codegen.to_source(expr)

After that we define a substitution function using NodeTransformer. For example:

substitute(parseExpr("a + b"), { "a": parseExpr("1") }) # 1 + b

The simulatenous substitution of multiple variables is needed to properly avoid nasty situations. For example substituting both a and b for a + b in a + b. The result should be (a + b) + (a + b), but if we substitute first a for a + b, we'll get (a + b) + b, and then substitute b, we'll get (a + (a + b)) + b which is the wrong result! So simultaneous is important:

class NameTransformer(ast.NodeTransformer):
    def __init__(self, names):
        self.names = names

    def visit_Name(self, node):
        if node.id in self.names:
            return self.names[node.id]
        else:
            return node

def substitute(expr, names):
    print "substitute"
    for varName, varValue in names.iteritems():
        print "  name " + varName + " for " + toSource(varValue)
    print "  in " + toSource(expr)
    return NameTransformer(names).visit(expr)

Then we write similar NodeTransformer to find calls, where we can inline function definitions:

class CallTransformer(ast.NodeTransformer):
    def __init__(self, fnName, varNames, fnExpr):
        self.fnName = fnName
        self.varNames = varNames
        # substitute in new fn expr for each CallTransformer
        self.fnExpr = copy.deepcopy(fnExpr)
        self.modified = False

    def visit_Call(self, node):
        if (node.func.id == self.fnName):
            if len(node.args) == len(self.varNames):
                print "expand call to " + self.fnName + "(" + (", ".join(self.varNames)) + ")" + " with arguments "+ ", ".join(map(toSource, node.args))
                # We substitute in args too!
                old_node = node
                args = map(self.visit, node.args)
                names = dict(zip(self.varNames, args))
                node = substitute(self.fnExpr, names)
                self.modified = True
                return node
            else:
                raise Exception("invalid arity " + toSource(node))
        else:
            return self.generic_visit(node)

def substituteCalls(expr, definitions, n = 3):
    while True:
        if (n <= 0):
            break
        n -= 1

        modified = False
        for fnName, varNames, fnExpr in definitions:
            transformer = CallTransformer(fnName, varNames, fnExpr)
            expr = transformer.visit(expr)
            modified = modified or transformer.modified

        if not modified:
            break

    return expr

The substituteCalls is recursive so we can inline recursive functions too. Also there is an explicit limit, because some definitions might be infinitely recursive (as fact below). There is a bit of ugly looking copying, but it is required to separate different subtrees.


And the example code:

if True:
    print "f1 first, unique variable names"
    ex = parseExpr("a+b+f1(f2(x, y), x)")
    ex = substituteCalls(ex, [
        ("f1", ["u", "v"], parseExpr("sin(u + v)")),
        ("f2", ["i", "j"], parseExpr("i + j ^ 2"))])
    print toSource(ex)
    print "---"

if True:
    print "f1 first"
    ex = parseExpr("a+b+f1(f2(x, y), x)")
    ex = substituteCalls(ex, [
        ("f1", ["x", "y"], parseExpr("sin(x + y)")),
        ("f2", ["x", "y"], parseExpr("x + y ^ 2"))])
    print toSource(ex)
    print "---"

if True:
    print "f2 first"
    ex = parseExpr("f1(f1(x, x), y)")
    ex = substituteCalls(ex, [
        ("f1", ["x", "y"], parseExpr("x + y"))])
    print toSource(ex)
    print "---"

if True:
    print "fact"
    ex = parseExpr("fact(n)")
    ex = substituteCalls(ex, [
        ("fact", ["n"], parseExpr("n if n == 0 else n * fact(n-1)"))])
    print toSource(ex)
    print "---"

Which prints out:

f1 first, unique variable names
expand call to f1(u, v) with arguments f2(x, y), x
substitute
  name u for f2(x, y)
  name v for x
  in sin((u + v))
expand call to f2(i, j) with arguments x, y
substitute
  name i for x
  name j for y
  in ((i + j) ^ 2)
((a + b) + sin((((x + y) ^ 2) + x)))
---
f1 first
expand call to f1(x, y) with arguments f2(x, y), x
substitute
  name y for x
  name x for f2(x, y)
  in sin((x + y))
expand call to f2(x, y) with arguments x, y
substitute
  name y for y
  name x for x
  in ((x + y) ^ 2)
((a + b) + sin((((x + y) ^ 2) + x)))
---
f2 first
expand call to f1(x, y) with arguments f1(x, x), y
expand call to f1(x, y) with arguments x, x
substitute
  name y for x
  name x for x
  in (x + y)
substitute
  name y for y
  name x for (x + x)
  in (x + x)
((x + x) + ((x + x) + x))
---
fact
expand call to fact(n) with arguments n
substitute
  name n for n
  in n if (n == 0) else (n * fact((n - 1)))
expand call to fact(n) with arguments (n - 1)
substitute
  name n for (n - 1)
  in n if (n == 0) else (n * fact((n - 1)))
expand call to fact(n) with arguments ((n - 1) - 1)
substitute
  name n for ((n - 1) - 1)
  in n if (n == 0) else (n * fact((n - 1)))
n if (n == 0) else (n * (n - 1) if ((n - 1) == 0) else ((n - 1) * ((n - 1) - 1) if (((n - 1) - 1) == 0) else (((n - 1) - 1) * fact((((n - 1) - 1) - 1)))))

Unfortunately codegen version in pypi is buggy. It doesn't parenthesise expressions properly, even AST says they should. I used jbremer/codegen (pip install git+git://github.com/jbremer/codegen). It adds unnecessary parenthesis too, but it's better than no at all. Thanks to @XavierCombelle for the tip.


The substitution gets trickier if you have anonymous functions, i.e lambda. Then you need to rename variables. You could try to search for lambda calculus with substitution or implementation. Yet I had bad luck to find any articles which use Python for the task.

Community
  • 1
  • 1
phadej
  • 11,947
  • 41
  • 78
  • 1
    looks like codegen project is unmaintained and has critical bugs for the use (no parenthesis around binop so no operation priority) the main project has this pending request waiting https://github.com/andreif/codegen/pull/4/commits so if someone wants to use it it would be better to use https://github.com/jbremer/codegen – Xavier Combelle Feb 02 '15 at 15:43
1

What is your long term goal? Is it to evaluate the function or simply perform substitution? In the former case you can simply try this (note that f1 and f2 could also be dynamically defined):

import math
math.sin

def f2(x, y):
    return x + y ** 2

def f1(x, y):
    return math.sin(x + y)

a, b = 1, 2
x, y = 3, 4
eval('a + b + f1(f2(x, y), x)')
# 2.991148690709596

If you want to replace the functions and get back the modified version, you will indeed have to resort to some sort of AST parser. Be careful though with the use of eval, as this opens up a security hole for malicious user input code.

Matt
  • 17,290
  • 7
  • 57
  • 71
  • I need only to perform substitution. Evaluation will be done with other program (simulator) – Alex Jan 15 '15 at 11:20
1

(Using sympy as adrianX suggested, with some extra code.)

Code below converts a given string to a new string after combining given functions. It's hasty and poorly documented, but it works.


WARNING!

Contains exec eval, malicious code could probably have an effected, if input is provided by external users.


UPDATE:

  • Rewrote the whole code. Works in Python 2.7.
  • Function arguments can be separated by comma or whitespace or both.
  • All examples in question and comments are working.

import re
import sympy


##################################################
# Input string and functions

initial_str = 'a1+myf1(myf2(a, b),y)'
given_functions = {'myf1(x,y)': 'cross(x,y)', 'myf2(a, b)': 'value(a,b)'}
##################################################


print '\nEXECUTED/EVALUATED STUFF:\n'


processed_str = initial_str


def fixed_power_op(str_to_fix):
    return str_to_fix.replace('^', '**')


def fixed_multiplication(str_to_fix):
    """
    Inserts multiplication symbol wherever omitted.
    """

    pattern_digit_x = r"(\d)([A-Za-z])"         # 4x -> 4*x
    pattern_par_digit = r"(\))(\d)"             # )4 -> )*4
    pattern_digit_par = r"[^a-zA-Z]?_?(\d)(\()"  # 4( -> 4*(

    for patt in (pattern_digit_x, pattern_par_digit, pattern_digit_par):
        str_to_fix = re.sub(patt, r'\1*\2', str_to_fix)

    return str_to_fix


processed_str = fixed_power_op(processed_str)


class FProcessing(object):

    def __init__(self, func_key, func_body):
        self.func_key = func_key
        self.func_body = func_body

    def sliced_func_name(self):
        return re.sub(r'(.+)\(.+', r'\1', self.func_key)

    def sliced_func_args(self):
        return re.search(r'\((.*)\)', self.func_key).group()

    def sliced_args(self):
        """
        Returns arguments found for given function. Arguments can be separated by comma or whitespace.

        :returns (list)
        """

        if ',' in self.sliced_func_args():
            arg_separator = ','
        else:
            arg_separator = ' '

        return self.sliced_func_args().replace('(', '').replace(')', '').split(arg_separator)

    def num_of_sliced_args(self):
        """
        Returns number of arguments found for given function.
        """
        return len(self.sliced_args())

    def functions_in_function_body(self):
        """
        Detects functions in function body.

        e.g. f1(x,y): sin(x+y**2), will result in "sin"

        :returns (set)
        """

        return set(re.findall(r'([a-zA-Z]+_?\w*)\(', self.func_body))

    def symbols_in_func_body(self):
        """
        Detects non argument symbols in function body.
        """

        symbols_in_body = set(re.findall(r'[a-zA-Z]+_\w*', self.func_body))

        return symbols_in_body - self.functions_in_function_body()


# --------------------------------------------------------------------------------------
# SYMBOL DETECTION (x, y, z, mz,..)


# Prohibited symbols
prohibited_symbol_names = set()
# Custom function names are prohibited symbol names.
for key in given_functions.keys():
    prohibited_symbol_names |= {FProcessing(func_key=key, func_body=None).sliced_func_name()}


def symbols_in_str(provided_str):

    """
    Returns a set of symbol names that are contained in provided string.

    Allowed symbols start with a letter followed by 0 or more letters,
    and then 0 or more numbers (eg. x, x1, Na, Xaa_sd, xa123)
    """
    symbol_pattern = re.compile(r'[A-Za-z]+\d*')
    symbol_name_set = re.findall(symbol_pattern, provided_str)
    # Filters out prohibited.
    symbol_name_set = {i for i in symbol_name_set if (i not in prohibited_symbol_names)}

    return symbol_name_set


# ----------------------------------------------------------------
# EXEC SYMBOLS
symbols_in_given_str = symbols_in_str(initial_str)
# e.g. " x, y, sd = sympy.symbols('x y sd') "
symbol_string_to_exec = ', '.join(symbols_in_given_str)
symbol_string_to_exec += ' = '
symbol_string_to_exec += "sympy.symbols('%s')" % ' '.join(symbols_in_given_str)

exec symbol_string_to_exec


# -----------------------------------------------------------------------------------------
# FUNCTIONS

# Detects secondary functions (functions contained in body of given_functions dict)
sec_functions = set()
for key, val in given_functions.items():
    sec_functions |= FProcessing(func_key=key, func_body=val).functions_in_function_body()


def secondary_function_as_exec_str(func_key):
    """
    Used for functions that are contained in the function body of given_functions.

    E.g.  given_functions = {f1(x): sin(4+x)}

    "my_f1 = sympy.Function('sin')(x)"

    :param func_key: (str)
    :return: (str)
    """

    returned_str = "%s = sympy.Function('%s')" % (func_key, func_key)

    print returned_str
    return returned_str


def given_function_as_sympy_class_as_str(func_key, func_body):
    """
    Converts given_function to sympy class and executes it.

    E.g.    class f1(sympy.Function):
                nargs = (1, 2)

                @classmethod
                def eval(cls, x, y):
                    return cross(x+y**2)

    :param func_key: (str)
    :return: (None)
    """

    func_proc_instance = FProcessing(func_key=func_key, func_body=func_body)

    returned_str = 'class %s(sympy.Function): ' % func_proc_instance.sliced_func_name()
    returned_str += '\n\tnargs = %s' % func_proc_instance.num_of_sliced_args()
    returned_str += '\n\t@classmethod'

    returned_str += '\n\tdef eval(cls, %s):' % ','.join(func_proc_instance.sliced_args())
    returned_str = returned_str.replace("'", '')

    returned_str += '\n\t\treturn %s' % func_body

    returned_str = fixed_power_op(returned_str)

    print '\n', returned_str
    return returned_str


# Executes functions in given_functions' body
for name in sec_functions:
    exec secondary_function_as_exec_str(func_key=name)

# Executes given_functions
for key, val in given_functions.items():
    exec given_function_as_sympy_class_as_str(func_key=key, func_body=val)


final_result = eval(initial_str)


# PRINTING
print '\n' + ('-'*40)
print '\nRESULTS'

print '\nInitial string: \n%s' % initial_str

print '\nGiven functions:'
for key, val in given_functions.iteritems():
    print '%s: ' % key, val

print '\nResult: \n%s' % final_result
user
  • 5,370
  • 8
  • 47
  • 75
  • I tried `initial_str = 'a+myf1(myf2(a, b),y)' function_names = { 'myf1(x,y)': 'cross(x,y)', 'myf2(a, b)': 'value(a,b)'}` But got error – Alex Feb 04 '15 at 12:08
  • @Alex The problem is that `value` and `cross` do not exist in sympy. I have made it able to recognize any sympy functions (`sin, cot, sqrt`, etc). Perhaps they have a different name. I assume `cross` is "cross product", correct? What is `value` though? – user Feb 04 '15 at 12:38
  • @Alex It probably can be fixed if i make a dict with custom names like `cross` matching the equivalent name in sympy. My bad for the poor docstrings i used. Did it work for other function names? – user Feb 04 '15 at 12:44
  • As I wrote, after translation result will be used in other program That program is analog simulator. So cross function is crossing point of two analog signals and value function gives Y value of signal at given X. What I need is general translation of any arbitrary function name to any other arbitrary function name. Of cause names of all functions known in advance – Alex Feb 05 '15 at 15:08
  • Almost. But accepted answer adds a lot of redundant parentheses and also new function need to have comma as separator between arguments when i really need space as separator. – Alex Feb 06 '15 at 19:55
  • @Alex Although you have already accepted a different answer than mine, and there is nothing for me to gain, i ll try to help you out. However, try to be more specific. _i really need space as separator_- That's a new requirement, that you haven't mentioned. Give an example. – user Feb 07 '15 at 08:35
  • Thank you for your efforts. I think it will be useful not only for me, but also for other people who have same problem. I already see four added this question to favorites. I think I not explained well about space as separator. The issue is I usually would like to replace function by function like i would like to replace c(x,y) by cross(x y a b c). So comma separator good for source function but in destination function I need space as separator. – Alex Feb 08 '15 at 13:34
  • Of cause it possible to replace all commas by spaces after substitution. But it will be nice if i will be able to specify destination function with spaces originally. For accepted question I need to write destination function as cross(x,y, a, b, c) – Alex Feb 08 '15 at 13:35
  • Did your try my new code? Does it work as expected? Any bugs? – user Feb 08 '15 at 13:37
0

I think you want to use something like PyBison which is a parser generator.

See an example that contains the basic code you need here:

http://freenet.mcnabhosting.com/python/pybison/calc.py

You need to add a token type for functions, and a rule for functions, and then what happens with that function if it is encountered.

If you need other information about parsing and so on, try to read some basic tutorials on Lex and (Yacc or Bison).

peschü
  • 1,299
  • 12
  • 21