0

suppose i have an sympy expression, it seems to me i can only substitute symbols with numbers. the question is can i substitute it with something else like a pandas series? For example,

from sympy import Symbol, Function
a_sym = Symbol('a')
b_sym = Symbol('b')
sum_func_sym = Function('sum_func')
expression = sum_func_sym(a_sym+b_sym)

is there a way for me to substitute a_sym and b_sym with pandas series and replace the sum_func_sym with series sum and then calculate the result?

import pandas as pd
df = pd.DataFrame({'a': [1,2], 'b': [3,4]})
a = df.a
b = df.b

def sum_func(series):
    return series.sum()

When i do the substitution and replacement i get an error:

expression.subs(a_sym, a).subs(b_sym, b).replace(sum_func_sym, sum_func)
AttributeError: 'Add' object has no attribute 'sum'

lingxiao
  • 1,214
  • 17
  • 33
  • What is your expected output from the sample dataframe? – Mayank Porwal Oct 22 '20 at 05:21
  • `(df.a+df.b).sum() = 10` in this case – lingxiao Oct 22 '20 at 05:23
  • `sympy` has a `lambdify` function that can turn an expression into a `numpy` function (or `mpmath` and few other options). Using symbols in numpy arrays is possible, but often raises problems, as can be seen in many SO questions. `pandas` is is another step removed from `sympy`, and I don't recall anyone else trying to use them together. – hpaulj Oct 22 '20 at 05:41
  • hmmm, basically i want the following but i'm not sure about using `eval`: `eval(expression.__repr__())` – lingxiao Oct 22 '20 at 05:52
  • You can not substitute a data frame into a sympy expression. Take a step back and think about what is the actual problem you are trying to solve and then ask a new question about that. – Oscar Benjamin Oct 22 '20 at 11:22
  • i have expressions like this: `df.a.sum()*(df.a*df.b).sum()/df.a.sum()`, i cannot programatically figure out that there's no need to calculate `df.a.sum()` because `df.a.sum()*(df.a*df.b).sum()/df.a.sum() = (df.a*df.b).sum()`, hence i want to use sympy to programatically simplify the expression before i put the values in – lingxiao Oct 23 '20 at 04:49

2 Answers2

1

Building upon this answer, I came up with the following implementation that seems to work for at least fairly simple use cases:

df = pd.DataFrame({'a': range(5), 'b': range(5)})
my_vars = symbols('a b')  # have to have same names as DataFrame columns
expr = parse_expr('a+Sqrt(b)+1')
# Create callable version of the expression
callable_obj = lambdify(my_vars, expr)
# Call the object, passing in the DataFrame columns as parameters
# Write the result in a new column of the dataframe
df['result'] = callable_obj(**{
  str(a): df[str(a)]  # pass column as variable with the same name
  for a in expr.atoms()  # for all atomic expressions...
  if isinstance(a, Symbol)  # that are Symbols (not constants)
})

The output is (as expected):

0    1.000000
1    3.000000
2    4.414214
3    5.732051
4    7.000000
dtype: float64
Henning
  • 163
  • 1
  • 6
0

I assume that you have a dataframe with many columns and you want to add two of them. However, the names of columns to be added are variables, unknown beforeahead. Here is the solution for this case. f-strings work for Python 3.6+, for other versions, please modify appropriately.

def sum(a, b):
    global df
    df[f'sum_of_{a}_and_{b}'] = df[a] + df[b]
    # for more general function than sum
    # df['f'sum_of_{a}_and_{b}']] = df.apply(lambda x: f(x[a],x[b]), axis=1)
    # where f is the function instead of the sum
Kate Melnykova
  • 1,863
  • 1
  • 5
  • 17
  • no, the question is more related to `sympy`, basically if i have a `sympy` expression, it seems to me i can only substitute symbols with numbers. the question is can i substitute it with something else like a pandas series. – lingxiao Oct 22 '20 at 05:33
  • You want to apply a sympy function entry-wise to the pandas dataframe, right? Does the function of interest accepts numbers or depends on variables? – Kate Melnykova Oct 22 '20 at 05:39
  • no i don't want to apply the sympy function entry wise. i want to substitute the whole symbol. basically in the above example i want `expression = sum_func_sym(a_sym+b_sym) = sum_func_sym(df.a+df.b) =(df.a+df.b).sum() = 10` – lingxiao Oct 22 '20 at 05:41
  • Sorry, I would need a more specific example to fully get the issue. Say, your special function is integral from `a` to `b`. You provide column names: the first column contains all lower limits, and the second column contains the upper limits. I assume that you want to apply the logic row-wise. Then, the commented formula does exactly what's above. – Kate Melnykova Oct 22 '20 at 05:49
  • the question is actually just asking how to substitute a sympy symbol with a pandas series and evaluate the result. for example, if i have `a_sym = Symbol('a')` and `expression = a_sym`, how do i substitute `a_sym` with `pd.Series([1,2,3])` so that `expression` becomes `pd.Series([1,2,3])`. – lingxiao Oct 22 '20 at 05:59