pandas frame cumulative sum conditioned

Question

Is there a way to cumsum values in column based on the value of another column but after the condition is no more respected, resetting the cumsum? In the below image, the conditioning factor is on 'Rbin' being equal to one.

random_data_frame = pandas.DataFrame()
random_data_frame['Rbin'] = [0,1,1,1,0,0,1,0]
random_data_frame['Rmomentum'] = [-0.07,0.03,0.06,0.005,-0.008,-0.8,0.8,-0.5]

maybe but we cant test since the data is an [image](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) which is not reproducible:( — anky, Jan 30 '21 at 17:36
I though it was very easily reproducible. I left an pandas dataframe as example to compute last column I have illustrated in the image — BloomShell, Jan 30 '21 at 17:41
https://stackoverflow.com/questions/41420822/python-pandas-conditional-cumulative-sum/41423048 answer number 2 has your code. Just adapt that and put it in your version. — Paul Brennan, Jan 30 '21 at 18:09
@PaulBrennan yeah, I saw that answer too, but the difficoult was on the idea on forming the column for applying the groupby cumsum function. Anyway, thanks for your help :D — BloomShell, Jan 30 '21 at 18:42

PieCot · Accepted Answer · 2021-01-30T18:31:24.933

You can create an ad-hoc index to identify continuous sequences of Rbin values, then use groupby and cumsum on those sequences and set to np.nan values of the cumulative sum where Rbin is zero.

random_data_frame['new_id'] = (random_data_frame.Rbin.diff() != 0).cumsum()
random_data_frame['cumulative_sum'] = random_data_frame.groupby('new_id')['Rmomentum'].cumsum().reset_index()['Rmomentum']
random_data_frame.loc[random_data_frame.Rbin == 0, 'cumulative_sum'] = np.nan

This is the result on your example:

   Rbin  Rmomentum  new_id  cumulative_sum
0     0     -0.070       1             NaN
1     1      0.030       2           0.030
2     1      0.060       2           0.090
3     1      0.005       2           0.095
4     0     -0.008       3             NaN
5     0     -0.800       3             NaN
6     1      0.800       4           0.800
7     0     -0.500       5             NaN

A compressed version is:

random_data_frame['cumulative_sum'] = np.where(
    random_data_frame.Rbin == 0,
    np.nan,
    random_data_frame.groupby((random_data_frame.Rbin.diff() != 0).cumsum())['Rmomentum'].cumsum().reset_index()['Rmomentum']
)

pandas frame cumulative sum conditioned

1 Answers1