Groupby function for cumsum and reset it index

Question

I have simple condition:

if sum for the month // 100:

print sum and reset cumsum index

else:

keep cumsumming

Here is my data:

data = dict(
    Year=['2018', '2018', '2018', '2018', '2018', '2017', '2017', '2017'],
    Month=['08', '08', '04', '05', '05', '06', '02', '01'],
    Money=[26, 50, 25, 45, 20, 36, 84, 24]
)

and here is my attempts:

df = pd.DataFrame(data)
df = df.groupby(['Year', 'Month']).sum()
df['cum_y'] = df.groupby(['Year']).Money.cumsum() 


df['cum_m'] = df.groupby([lambda x: x // 100], level=0).Money.cumsum()

df['cum_m'] = df.groupby(lambda x: [x if x // 100 else None]).Money.cumsum()

df['cum_m'] = df.groupby(['Money']).agg(lambda x: x // 100).cumsum()

and I want something like that:

            Money  cum_y  cum_m (Payout actually)
Year Month
2017 01        24    24     x    (means None)
     02        84   108    108 - reset cumsum counter()
     06        36   144     x    (36)
2018 04        25    25     x    (61)
     05        65    90    126 - reset cumsum counter()
     08        76   166     x    (76)

score 2 · Accepted Answer · edited Jan 17 '19 at 07:16

2

I know that iterating should be avoided whenever possible, but here is a solution using iteration:

total = 0
Cumsum = []
for item in df.Money:
    total += item
    if total < 100:
        Cumsum.append(np.nan)
    else:
        Cumsum.append(total)
        total = 0

df['Cumsum'] = Cumsum

Output:

               Money    Cumsum
Year    Month       
2017    01     24       NaN
        02     84       108.0
        06     36       NaN
2018    04     25       NaN
        05     65       126.0
        08     76       NaN

edited Jan 17 '19 at 07:16

jezrael

822,522
95
1,334
1,252

answered Jan 17 '19 at 07:10

Joe Patten

1,664
1
9
15

So good old plain python is only solution? I was thinking about more panda'nish workaround, meh ~_~ but anyway thank you! – Yu Da Chi Jan 17 '19 at 07:50
According to [this article](https://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas) *(4th answer)* - plain loop in DataFrame element is more efficient **(in 214 times lol)** than dedicated function for iterations like *iterrows* and even *itertuples*! So I definitely steak with your solution, thanks again! =) – Yu Da Chi Jan 17 '19 at 16:58

Groupby function for cumsum and reset it index

1 Answers1