1

The question I have is closely related to this post. Assume I have the following dataset:

df = pd.DataFrame({"A":range(1,10), "B":range(5,14), "Group": 
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], 
"Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22]})

The last observation for the group is straight forward. This is how the code looks like:

def calculate(df):
if (df.last == 1):
    value = df.loc["A"] + df.loc["B"]
else:

for all other observation PER GROUP, the row value is calculated as follows:

value = (df.loc[i-1, "C"] + 3 * df.loc[i, "A"] + 1.65 * df.loc[i, "B"])
    return value

To further clarify, these are the formulas for calculating the Want column for Group 2 using excel: F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7". There's some kind of "recursive" nature to it, which is why I thought of the "for loop"

I would really appreciate a solution where it's consistent with the first if statement. That is

value = something

rather than the function returning a data frame or something like that, so that I can call the function using the following

df["value"] = df.apply(calculate, axis=1)

Your help is appreciated. Thanks

rafaelc
  • 57,686
  • 15
  • 58
  • 82
xyz
  • 31
  • 5

1 Answers1

0

You don't need apply here. Usually, apply is very slow and you'll want to avoid that.

Problems with this recursive characteristic, however, are usually hard to vectorize. Thankfully, yours can be solved using a reversed cumsum and np.where

df['Want'] = np.where(df['last'] == 1, df['A'] + df['B'], 3*df['A'] + 1.65*df['B'])
df['Want'] = df[::-1].groupby('Group')['Want'].cumsum()
rafaelc
  • 57,686
  • 15
  • 58
  • 82