3

As my question states, I would like to invoke custom function on run-time to a dataframe. Use of custom function will be to calculate difference between two date (i.e. age), convert year to months, find max-min from two columns etc.

So Far, I succeeded in performing arithmetic operations and few functions like abs(), sqrt() but couldn't get min()-max() working.Things working are,

df.eval('TT = sqrt(Q1)',inplace=True)
df.eval('TT1 = abs(Q1-Q2)',inplace=True)
df.eval('TT2 = (Q1+Q2)*Q3',inplace=True)

Following code works with eval. How can I use the same with dataframe eval ?

def find_max(x,y):
    return np.maximum(x,y)

eval('max1')(4,7)

def find_age(date_col1,date_col2):
    return 'I know how to calc age but how to call func this with df.eval and assign to new col'

Sample dataframe:

op_d = {'ID': [1, 2,3],'V':['F','G','H'],'AAA':[0,1,1],'D':['2019/12/04','2019/02/01','2019/01/01'],'DD':['2019-12-01','2016-05-31','2015-02-15'],'CurrentRate':[7.5,2,2],'NoteRate':[2,3,3],'BBB':[0,4,4],'Q1':[2,8,10],'Q2':[3,5,7],'Q3':[5,6,8]}
df = pd.DataFrame(data=op_d)

Any help or link to Doc is appreciated.

helpful links I found but not addressing my issues are:

Dynamic Expression Evaluation in pandas using pd.eval()

Using local variables with multiple assignments with pandas eval function

Passing arguments to python eval()

cs95
  • 379,657
  • 97
  • 704
  • 746
Prish
  • 605
  • 7
  • 21

1 Answers1

8

Functions can be called as usual, you need to reference them with the @ synbol:

df                                                                  
   A  B
0  1  0
1  0  0
2  0  1

def my_func(x, y): return x + y                                     

df.eval('@my_func(A, B)')                                          
0    1
1    0
2    1
dtype: int64

Of course, the expectation here is that your functions expect series as arguments. Otherwise, wrap your function in a call to np.vectorize, as appropriate.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • If I have function as, def max1(x,y): return np.maximum(x,y) and call df.eval('@max1(TT,TT1)') , I get ```TypeError: 'Series' objects are mutable, thus they cannot be hashed ```. Do you know why is it a case? – Prish Mar 10 '20 at 15:44
  • 1
    I guess default engine 'numexpr' can't handle it. resolved it using engine='python'. – Prish Mar 10 '20 at 15:58
  • 1
    @Prish that seems right to me. Numexpr is generally faster but there are limitations to its use. – cs95 Mar 10 '20 at 16:33
  • do you know how to call round,ceil,floor function from eval? i.e. ```df.eval('round(A-B)')```. abs() and sqrt() works fine. – Prish Mar 19 '20 at 15:42
  • @Prish If you just need to round the result, doesn't `df.eval('A-B').round()` work? – cs95 Mar 19 '20 at 19:32
  • Unfortunately No. It resulted in error. I am tracking this issue in a different question but thought of asking you here since no one has answer to it. link: https://stackoverflow.com/questions/60762008/calling-round-ceiling-floor-min-max-in-pandas-eval – Prish Mar 19 '20 at 20:07