1

I'm using an existing solution to try to produce a cumulative sum that resets after a certain value (in this case >= 16). Currently I get the following output, but there are cases when the cumsum is still greater than 16.

Size    cumsum
8       8
8       16     ---correct
8       8
8       16     ---correct
7       7
6       13     (should be reset here since next value causes cumsum >16)
7       20     ---incorrect
6       6
5       11
2       13

The code I am using is:

df = pd.DataFrame({'Size':[8,8,8,8,7,6,7,6,5,2]})

ls = []  
cumsum = 0
last_reset = 0
for _, row in df.iterrows():
    cumsum = cumsum + row.Size
    ls.append(cumsum)
    if cumsum >= 16:
        last_reset = cumsum
        cumsum = 0 
df['cumsum'] = ls

Any ideas how to correct this?

Ellio
  • 117
  • 8
  • Does this answer your question? [How to reset cumsum after change in sign of values?](https://stackoverflow.com/questions/49390300/how-to-reset-cumsum-after-change-in-sign-of-values) – FBruzzesi May 04 '20 at 10:31
  • I think its more complicated than that @FBruzzesi – yatu May 04 '20 at 10:31
  • 1
    Does this answer your Q : https://stackoverflow.com/questions/57510604/perfrom-cumulative-sum-over-a-column-but-reset-to-0-if-sum-become-negative-in-pa – Sreeram TP May 04 '20 at 10:42
  • Does this answer your question? [Perfrom cumulative sum over a column but reset to 0 if sum become negative in Pandas](https://stackoverflow.com/questions/57510604/perfrom-cumulative-sum-over-a-column-but-reset-to-0-if-sum-become-negative-in-pa) – Sreeram TP May 04 '20 at 10:52

3 Answers3

3
df = pd.DataFrame({'Size':[8,8,8,8,7,6,7,6,5,2]})

ls = []  
cumsum = 0
last_reset = 0
for _, row in df.iterrows():
    if cumsum + row.Size <= 16:
        cumsum += row.Size
    else:
        last_reset = cumsum
        cumsum = row.Size
    ls.append(cumsum)

df['cumsum'] = ls

Result:

    Size    cumsum
0   8       8
1   8       16
2   8       8
3   8       16
4   7       7
5   6       13
6   7       7
7   6       13
8   5       5
9   2       7
ilyankou
  • 1,309
  • 8
  • 13
1

A slight variation of accepted answer here : Perfrom cumulative sum over a column but reset to 0 if sum become negative in Pandas can help you solve this.

Since this solution makes use of numba, it will be way lot faster than normal iterrows solution

@njit
def cumli(x, lim):
    total = 0
    result = []
    for i, y in enumerate(x):
        total += y
        if (total) > lim:
            total = y
        result.append(total)
    return result

cumli(df.Size.values, 16)

# [8, 16, 8, 16, 7, 13, 7, 13, 5, 7]
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108
1

I think, a good solution is a "function with memory" to be applied to each value from your column of interest.

Define the following function to calculate your sum:

def myCumSum(val):
    myCumSum.sum += val
    if myCumSum.sum > 16:
        myCumSum.sum = val
    return myCumSum.sum

Note that this function has an attribute (sum), that should be set before the first call of this function.

myCumSum.sum = 0

Then apply it to Size column and save the result as cumsum column:

df['cumsum'] = df.Size.apply(myCumSum)
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41