1

Is there a way to cumsum values in column based on the value of another column but after the condition is no more respected, resetting the cumsum? In the below image, the conditioning factor is on 'Rbin' being equal to one.

enter image description here

random_data_frame = pandas.DataFrame()
random_data_frame['Rbin'] = [0,1,1,1,0,0,1,0]
random_data_frame['Rmomentum'] = [-0.07,0.03,0.06,0.005,-0.008,-0.8,0.8,-0.5]
BloomShell
  • 833
  • 1
  • 5
  • 20
  • 2
    maybe but we cant test since the data is an [image](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) which is not reproducible:( – anky Jan 30 '21 at 17:36
  • I though it was very easily reproducible. I left an pandas dataframe as example to compute last column I have illustrated in the image – BloomShell Jan 30 '21 at 17:41
  • https://stackoverflow.com/questions/41420822/python-pandas-conditional-cumulative-sum/41423048 answer number 2 has your code. Just adapt that and put it in your version. – Paul Brennan Jan 30 '21 at 18:09
  • @PaulBrennan yeah, I saw that answer too, but the difficoult was on the idea on forming the column for applying the groupby cumsum function. Anyway, thanks for your help :D – BloomShell Jan 30 '21 at 18:42

1 Answers1

2

You can create an ad-hoc index to identify continuous sequences of Rbin values, then use groupby and cumsum on those sequences and set to np.nan values of the cumulative sum where Rbin is zero.

random_data_frame['new_id'] = (random_data_frame.Rbin.diff() != 0).cumsum()
random_data_frame['cumulative_sum'] = random_data_frame.groupby('new_id')['Rmomentum'].cumsum().reset_index()['Rmomentum']
random_data_frame.loc[random_data_frame.Rbin == 0, 'cumulative_sum'] = np.nan

This is the result on your example:

   Rbin  Rmomentum  new_id  cumulative_sum
0     0     -0.070       1             NaN
1     1      0.030       2           0.030
2     1      0.060       2           0.090
3     1      0.005       2           0.095
4     0     -0.008       3             NaN
5     0     -0.800       3             NaN
6     1      0.800       4           0.800
7     0     -0.500       5             NaN

A compressed version is:

random_data_frame['cumulative_sum'] = np.where(
    random_data_frame.Rbin == 0,
    np.nan,
    random_data_frame.groupby((random_data_frame.Rbin.diff() != 0).cumsum())['Rmomentum'].cumsum().reset_index()['Rmomentum']
)
PieCot
  • 3,564
  • 1
  • 12
  • 20