Issue with user defined functions when using Pandas Data Frames

Question

I have encountered an issue when calling a user defined function when using Pandas DataFrames. Stock market data is read in from a SQLite3 database in the following form:

date high low close

The following code sums high, low and close values for each row and adds a new column 'Sum' to df:

def Sum(h, l, c):
    return h+l+c

df.loc[:, 'Sum'] = Sum(df['high'], df['low'], df['close'])

             high    low  close     Sum
date
2018-01-23  80.65  78.25  79.45  238.35
2018-01-24  81.65  79.50  80.50  241.65
2018-01-25  81.70  80.25  81.10  243.05
2018-01-26  81.25  78.25  78.75  238.25
2018-01-29  70.95  62.25  64.15  197.35

However, if the function is changed to return the maximum value of high, low, close for each row in df an error ("ValueError: The truth value of a Series is ambiguous.") is generated.

def Max(h, l, c):
    return max(h, l, c)

df.loc[:, 'Max'] = Max(df['high'], df['low'], df['close'])

What is the issue with the Max function?

Have you considered just using: `df[['low', 'high', 'close']].max(axis=1)` (and similarly `.sum(axis=1)` ? It's generally best to avoid user functions unless absolutely required... Also look at `.agg` and you can do sum/max at the same time... — Jon Clements, Jan 31 '18 at 12:27

score 0 · Answer 1 · answered Jan 31 '18 at 12:33

0

Jon Clements' comment is the way you should go. However, should you wish to perform more complex complications, pd.DataFrame.apply has this functionality:

Note I have renamed your function to avoid conflicts with built-in functions.

def max_df(h, l, c):
    return max(h, l, c)

df['Max'] = df.apply(lambda row: max_df(row['high'], row['low'], row['close']), axis=1)

This is a good post to learn about these options, if you cannot vectorise your calculation:

Difference between map, applymap and apply methods in Pandas

answered Jan 31 '18 at 12:33

jpp

159,742
34
281
339

Can the lambda expression be modified so that yesterdays close is used? I think possibly a new column would have to be created named 'yclose' where df['yclose'] = df['close'].shift() – w2kpro Feb 01 '18 at 11:32
@w2kpro, not as far as I'm aware. `df.apply` requires an `axis` which is usually along row or column, but not on the entire dataframe. – jpp Feb 01 '18 at 11:35
My original 'Schoolboys' solution was to create a 'for' loop and do something like: df.loc[i:, 'TR'] = tr_df(df.high.iat[i], df.low.iat[i], df.close.iat[i-1]). But this seems to take longer to execute and is not a very elegant solution. tr_df being a user defined function. – w2kpro Feb 01 '18 at 12:00
have a look at `pd.DataFrame.shift`: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html, I'm sure there's a way to use `apply` on the shifted dataframe, but you may need to fill in `NaN` values for a row at the top or bottom. – jpp Feb 01 '18 at 12:03
I have tried a shifted 'close' substitution in the lambda expression, but it generates an 'Ambiguous' exception. At the moment, it looks as if the extra column will have to be the way to go. I shall investigate if creating a temporary copy of 'yclose' could be created so that it is not permanently written to df. – w2kpro Feb 01 '18 at 12:18

Issue with user defined functions when using Pandas Data Frames

1 Answers1