Conditional sum in Python between multiple columns

Question

I have the following script, from a larger analysis of securities data,

returns_columns = []

df_merged[ticker + '_returns'] = df_merged[ticker + '_close'].pct_change(periods=1)
returns_columns.append(ticker + '_returns')

df_merged['applicable_returns_sum'] = (df_merged[returns_columns] > df_merged['return_threshold']).sum(axis=1)

'return_threshold' is a complete series of float numbers.

I've been able to successfully sum each row in the returns_columns array, but cannot figure out how to conditionally sum only the numbers in the returns_columns that are greater than the res'return_threshold' in that row.

This seems like a problem similar to the one shown here, Python Pandas counting and summing specific conditions, but I'm trying to sum based on the changing condition in the returns_columns.

Any help would be much appreciated, thanks as always!

EDIT: ANOTHER APPROACH This is another approach I tried. The script below has an error associated with the ticker input, even though I think it's necessary, and then produces and error:

def compute_applicable_returns(row, ticker):
    if row[ticker + '_returns'] >= row['top_return']:
        return row[ticker + '_returns']
    else:
        return 0

df_merged['applicable_top_returns'] = df_merged[returns_columns].apply(compute_applicable_returns, axis=1)

You want to get the sum of numbers greater than a threshold? — Sean Pianka, Oct 14 '18 at 20:57

score 1 · Answer 1 · answered Oct 14 '18 at 20:59

1

The [] operator for a dataframe should allow you to filter by an expression df > threshold and return a dataframe. You can then call .sum() on this df.

df[df > threshold].sum()

answered Oct 14 '18 at 20:59

Sean Pianka

2,157
2
27
43

Sean, thank you for your reply. I might be misunderstanding what you're suggesting, I'm not experienced with programming - here's what I tried: df_merged['applicable_returns_sum'] = df[returns_columns > df_merged['top_return']].sum() which gave the error, TypeError: invalid type comparison – JoeJack Oct 14 '18 at 21:12

score 0 · Answer 2 · answered Oct 14 '18 at 22:56

answered the question like this:

def compute_applicable_returns(row, ticker):
    if row[ticker + '_returns'] >= row['return_threshold']:
        return row[ticker + '_returns']
    else:
        return 0

for ticker in tickers:
    df_merged[ticker + '_applicable_returns'] = df_merged.apply(compute_applicable_returns, args=(ticker,), axis=1)

Conditional sum in Python between multiple columns

2 Answers2