I have four columns in a data frame like so:
A B C D
75472 d1 x -36.0 0.0
75555 d2 x -38.0 0.0
75638 d3 x -18.0 0.0
75721 d4 x -18.0 1836.0
75804 d5 x 1151.0 0.0
75887 d6 x 734.0 0.0
75970 d7 x -723.0 0.0
And I want to conditionally sum D
by:
- D has a value, then D
- Else, take value from previous row for D plus C
So for above, D would be [-36, -74, -92, 1836, 2987, 3721, 2998]
.
I've been able to do this successfully with a for loop
for i, row in me.iterrows():
try:
if row['D'] > 0:
step1 = me.loc[(me['B'] == row['B']) & (me['A'] == row['A']), 'output'].iloc[0]
me_copy.iloc[i, me_copy.columns.get_loc('output')] = step1
else:
step1 = me.loc[(me['B'] == row['B']) & (me['A'] == (row['A'] - pd.DateOffset(days=1))), 'step1'].iloc[0]
receipts_adjustments_sales = me.loc[(me['B'] == row['B']) & (me['A'] == row['A']), 'C'].iloc[0]
me_copy.iloc[i, me_copy.columns.get_loc('output')] = step1 + receipts_adjustments_sales
except:
me_copy.iloc[i, me_copy.columns.get_loc('output')] = 0
But the for loop is obviously really expensive, anti-pattern and basically doesn't run over my whole data frame. I'm trying to copy an excel function here that has basically been written over a panel of data, and for the life of me I cannot figure out how to do this with:
pd.Series.shift()
pd.Series.rolling()
- Simply calculating different column values
I was attempting to do it with shift()
for a while, but I realized that I kept having to make a separate column for each row, and that's why I went with a for loop.
Generalized to Groups
df.loc[:, 'A_group'] = df.groupby(['A'])[df['D'] != 0].cumsum()
df.loc[:, 'E'] = df['D'].mask(df['D'] == 0).combine_first(df['C'])
df.loc[:, 'F'] = me.groupby(['A', 'A_group'])['E'].cumsum()
Thanks to Scott Boston for the help!