How to calculate cumulative sum (reversed) of a Python DataFrame within given groups?

Question

I have a data frame (df_f) with many (n=19) columns that, if conceptually simplified looks something like this:

Basin	(n=17 columns)	Chi
13.0	...	4
13.0	...	8
13.0	...	2
21.0	...	4
21.0	...	6
38.0	...	1
38.0	...	7
38.0	...	2
38.0	...	4

The real data frame has around 70,000 rows and around 60 unique 'Basin' ID values (and of course other columns with numerical data, but we don't need them for this, I still want to keep them though for plotting).

What I would like is the following:

I want calculate the cumulative sum of the 'Chi' value, but reversed (so, going up), AND I want it to be reset for each Basin 'group'. So the values should be:

Basin	(n=17 columns)	Chi_cum
13.0	...	14
13.0	...	10
13.0	...	2
21.0	...	10
21.0	...	6
38.0	...	14
38.0	...	13
38.0	...	6
38.0	...	4

As you can see, in Chi_cum, we progressively add the Chi values going up, but we reset the sum for each 'Basin'.

I know how to do a reverse cumulative sum (df_f['Chi_cum'] = df_f.loc[::-1, 'Chi'].cumsum()[::-1]), but that is for the entire dataframe. Conversely, I know how to use '.groupby', but I can't seem to combine both methods of grouping and summing.

I tried doing something like:

df_f["Chi_cum"] = df_f.groupby(by=['Basin']).sum().iloc[::-1].groupby(level=[0]).cumsum().iloc[::-1]

(as inspired by Pandas Python Groupby Cummulative Sum Reverse), but it does not work!

Could someone please help me figure this out? Thanks!

score 1 · Answer 1 · 2022-03-16T16:45:29.583

1

Apparently you can't actually add ascending=True or reverse=True to cumsum (!?!?), so just reverse before and after cumsum for each group:

df['Chi'] = df.groupby('Basin')['Chi'].apply(lambda x: x[::-1].cumsum()[::-1])

Output:

>>> df
   Basin  Chi
0   13.0   14
1   13.0   10
2   13.0    2
3   21.0   10
4   21.0    6
5   38.0   14
6   38.0   13
7   38.0    6
8   38.0    4

edited Mar 16 '22 at 16:45

answered Mar 16 '22 at 00:24

1

Thank you so much for the fast reply! I will try implementing this tomorrow and check everything looks good :) – g.is.stuck Mar 16 '22 at 00:49
I approved the second answer since it also includes the bottom (8th) element and is a little bit more succinct! I would give you an upvote but I don't have reputation to give that out yet sadly. Thank you for taking the time to help! – g.is.stuck Mar 16 '22 at 16:44
1

Okay, that's fine! :) Note that the last item missing was just a typo in copying the output :) – Mar 16 '22 at 16:45
1

I just got some new rep so I just upvoted you! – g.is.stuck Mar 16 '22 at 16:51

score 1 · Accepted Answer · edited Mar 16 '22 at 00:55

You can try with series groupby

df['new'] = df.loc[::-1, 'Chi'].groupby(df['Basin']).cumsum()
df
Out[858]: 
   Basin (n=17 columns)  Chi  new
0   13.0            ...    4   14
1   13.0            ...    8   10
2   13.0            ...    2    2
3   21.0            ...    4   10
4   21.0            ...    6    6
5   38.0            ...    1   14
6   38.0            ...    7   13
7   38.0            ...    2    6
8   38.0            ...    4    4

How to calculate cumulative sum (reversed) of a Python DataFrame within given groups?

2 Answers2