I am trying to develop the following filter with a pandas dataframe:
- I have four columns,
A
,B
,A_prime
andB_prime
; - if A or B is less than a threshold,
C
, then I would like to find the sum betweenA_prime
andB_prime
and allocate it to the maximum ofA_prime
andB_prime
, while setting the minimum ofA_prime
andB_prime
to zero.
How would I write this as a pandas aggregation function?
A working example, written inefficiently, is shown below:
import pandas as pd
import numpy as np
data = {
"A":list(np.abs(np.random.randn(10))),
"B":list(np.abs(np.random.randn(10))),
"A_prime":list(np.abs(np.random.randn(10))),
"B_prime":list(np.abs(np.random.randn(10)))
}
df = pd.DataFrame.from_dict(data)
C = 0.2
print("BEFORE:")
print(df)
for index, row in df.iterrows():
if(row["A"] < C or row["B"] < C):
max_idx = np.argmax([row["A"], row["B"]])
if(max_idx==0):
row["A_prime"] = row["A_prime"] + row["B_prime"]
row["B_prime"] = 0
else:
row["B_prime"] = row["A_prime"] + row["B_prime"]
row["A_prime"] = 0
print("")
print("AFTER:")
print(df)
Output:
BEFORE:
A B A_prime B_prime
0 0.182445 0.924890 1.563398 0.562325
1 0.252587 0.273637 0.515395 0.538876
2 1.369412 1.985702 1.813962 1.643794
3 0.834666 0.143880 0.860673 0.372468
4 1.380012 0.715774 0.022681 0.892717
5 0.582497 0.477100 0.956821 1.134613
6 0.083045 0.322060 0.362513 1.386124
7 1.384267 0.251577 0.639843 0.458650
8 0.375456 0.412320 0.661661 0.086588
9 0.079226 0.385621 0.601451 0.837827
AFTER:
A B A_prime B_prime
0 0.182445 0.924890 0.000000 2.125723
1 0.252587 0.273637 0.515395 0.538876
2 1.369412 1.985702 1.813962 1.643794
3 0.834666 0.143880 1.233141 0.000000
4 1.380012 0.715774 0.022681 0.892717
5 0.582497 0.477100 0.956821 1.134613
6 0.083045 0.322060 0.000000 1.748638
7 1.384267 0.251577 0.639843 0.458650
8 0.375456 0.412320 0.661661 0.086588
9 0.079226 0.385621 0.000000 1.439278