A groupby cumsum that is set to zero and restarts the cumsum when the value is below zero

Question

I'm trying to create a column with cumulative sum that resets to zero when the cumsum gets below zero. I the following data:

id	treatment	value
1	drugs	66
1	drugs	33
1	drugs	-100
1	drugs	11
1	drugs	30
1	drugs	-50

The desired result:

id	treatment	days	cumsum
1	drugs	66	66
1	drugs	33	99
1	drugs	-100	0
1	drugs	11	11
1	drugs	30	41
1	drugs	-50	0

Is there a solution close to this attempt?

df.groupby(['id','treatment']).days.apply(lambda x: 0 if x.cumsum() < 0 else x.cumsum())

score 2 · Answer 1 · answered Nov 13 '21 at 16:12

2

Building off of @BENY's excellent answer here, you can do this:

df['cumsum'] = df.groupby(df['value'].lt(0).astype(int).diff().ne(0).cumsum())['value'].cumsum().clip(lower=0)

Output:

>>> df
   id treatment  value  cumsum
0   1     drugs     66      66
1   1     drugs     33      99
2   1     drugs   -100       0
3   1     drugs     11      11
4   1     drugs     30      41
5   1     drugs    -50       0

answered Nov 13 '21 at 16:12

I don't know if it matters but the OP attempt is `df.groupby(['id','treatment'])...` it seems that the cumsums could be distinct per `id` and `treatment` (which works for this sample since there is only 1 `id` `treatment` pair.) but may not work on the full dataset – Henry Ecker Nov 13 '21 at 16:36
Oh, I didn't notice. You're probably right (although the groupby has no effect on the _sample_ data :) – Nov 14 '21 at 15:49
1

@HenryEcker is right, but thanks for the suggestion! – Sandus Nov 14 '21 at 15:56

score 1 · Accepted Answer · edited Nov 14 '21 at 15:50

1

I will recommend use this numba function this time:

from numba import njit

@njit
def cumli(x, lim):
    total = 0
    result = []
    for i, y in enumerate(x):
        total += y
        if total < lim:
            total = 0
        result.append(total)
    return result

df['cumsum'] = df.groupby(['id','treatment']).days.transform(lambda x: cumli(x.values,0))

edited Nov 14 '21 at 15:50

answered Nov 13 '21 at 16:19

BENY

317,841
20
164
234

But is this vectorized? If it's not, are you not worried about the speed? – Nov 13 '21 at 16:26
2

numba is typically much faster than any pandas methods. – Henry Ecker Nov 13 '21 at 16:34
2

@user17242583 you can test the speed on your side ~ :-) – BENY Nov 13 '21 at 17:09
1

@HenryEcker yes it is ~ :-) – BENY Nov 13 '21 at 17:09
Exactly what I was after, thank you very much! – Sandus Nov 14 '21 at 18:42

A groupby cumsum that is set to zero and restarts the cumsum when the value is below zero

2 Answers2