0

I am facing an issue with the following dataset:

item                  price       
1                     1706
2                     210
3                     1664
4                     103
5                     103
6                     314
7                     1664
8                     57
9                     140
10                    1628
11                    688
12                    180
13                    604
14                    86
15                    180
16                    86
17                    1616
18                    832
19                    1038
20                    57
21                    2343
22                    151
23                    328
24                    328
25                    57
26                    86
27                    1706
28                    604
29                    609
30                    86
31                    0
32                    57
33                    302
34                    328

I want to have a cumulative sum column which "resets" each time it reaches the threshold (read not exceed it, it is fine to have a big gap between the last cumsum number and the threshold as long as it does not exceed it).

I have tried the following code:

threshold = (7.17*1728)*0.75  #this is equal to 9292.32
df['cumsum'] = df.groupby((df['price'].cumsum()) // threshold)['price'].cumsum()

This output the following:

item                  price             cumsum    
1                     1706              1706
2                     210               1916
3                     1664              3580
4                     103               3683
5                     103               3786
6                     314               4100
7                     1664              5764
8                     57                5821
9                     140               5961
10                    1628              7589
11                    688               8277
12                    180               8757
13                    604               9061
14                    86                9147
15                    180               9327 #exceeds threshold
16                    86                9413 #
17                    1616              1616
18                    832               2448
19                    1038              3486
20                    57                3543
21                    2343              5886
22                    151               6037
23                    328               6365
24                    328               6693
25                    57                6750
26                    86                6836
27                    1706              8542
28                    604               9146
29                    609               9755 #exceeds threshold same below
30                    86                9841 #
31                    0                 9841 #
32                    57                9898 #
33                    302               10200 #
34                    328               328

My expected result would be the following instead (for the first part for example):

item                  price             cumsum    
1                     1706              1706
2                     210               1916
3                     1664              3580
4                     103               3683
5                     103               3786
6                     314               4100
7                     1664              5764
8                     57                5821
9                     140               5961
10                    1628              7589
11                    688               8277
12                    180               8757
13                    604               9061
14                    86                9147
15                    180               180 #
16                    86                266 #

What do I need to change in order to get this result? also i would appreciate any explanation as to why the above code does not work.

Thank you in advance.

  • You can check [Restart cumsum and get index if cumsum more than value](https://stackoverflow.com/questions/56904390/restart-cumsum-and-get-index-if-cumsum-more-than-value#56904650) – anky Jan 04 '21 at 09:56
  • I tested your solution and the output is not the same as you put, it actually resembles your expected output – Dani Mesejo Jan 04 '21 at 09:59
  • your code does not work well, because you want to reset to 0 to cumsum value, but you create groups refer to previous groups values by using `df['price'].cumsum())`. – Ferris Jan 04 '21 at 10:37
  • you should try Google: search for `pandas cumulative sum with reset site:stackoverflow.com`; it yields over 1,200 results. – Pierre D Jan 05 '21 at 06:38

2 Answers2

0

Maybe it costs a lot, but it can work...

threshold = (7.17*1728)*0.75  #this is equal to 9292.32
df['cumsum'] = df['price'].cumsum()

# handle the cumsum which is gt threshold by loops
n = 1
while True:
    print(n)
    cond = df['cumsum'].ge(threshold)
    if cond.sum():
        df.loc[cond, 'cumsum'] = df.loc[cond, 'price'].cumsum()
    else:
        break
    n += 1
Ferris
  • 5,325
  • 1
  • 14
  • 23
0

Thank you for all the replies and feedback.

I went ahead with the below code which solves my issue:

ls = []
cumsum = 0
lastreset = 0
for _, row in df.iterrows():
    if cumsum + row.price <= threshold:
        cumsum += row.price
    else:
        last_reset = cumsum
        cumsum = row.price
    ls.append(cumsum)

df['cumsum'] = ls