Cummulative Sum with restart based on condition in Pandas

Question

I am trying to find cumulative sum of a column in pandas based on a reset condition of 2%. That is as and when the cumulative sum exceeds 2% , I reset my cumulative sum to 0 and restart the process of calculating the cumulative sum.

The dataframe I am working with looks like this nifty

Open,High,Low,Close,close_2%,cumsum (resets @ 2%), new_group

4.883,4.883,4.883,4.883,NA,NA,NA

4.856,4.954,4.856,4.954,1.45%,1.45%,0
4.948,4.948,4.945,4.945,0.18%,0.18%,0
4.935,4.935,4.857,4.857,1.78%,3.42%,0
4.856,4.916,4.856,4.915,1.19%,1.19%,1 << data got reset because cumulative sum exceeded 2%.
4.915,4.915,4.856,4.903,0.24%,1.44%,1
4.902,4.902,4.9,4.9,0.06%,1.50%,1
4.9,4.995,4.9,4.954,1.10%,2.60%,1
4.954,4.954,4.954,4.954,0%,0%,2 << data got reset because cumulative sum exceeded 2%.

df.head() looks like so

    Open    High    Low   Close   close_2%  cumsum (resets @ 2%) new_group
0   4.883   4.883   4.883   4.883   NaN     NaN    NaN
1   4.856   4.954   4.856   4.954   1.45%   1.45%   0.0
2   4.948   4.948   4.945   4.945   0.18%   0.18%   0.0
3   4.935   4.935   4.857   4.857   1.78%   3.42%   0.0

Solutions I have tried and did not work are

nifty['cumsum']=0
total = 0
for i, v in nifty.iterrows():
  total += nifty['close_2%'][-1]
  nifty['cumsum'].iloc[-1] = total
 
  if total > 2.0:
    print('total',total)
    total = 0

2nd method

nifty['close_2%']=np.abs(nifty['Close'].pct_change()*100)
nifty['var']=nifty['close_2%'].cumsum() # where 'var' is a temp variable.
nifty['cumsum']=np.where(nifty['var']<2,nifty['var'],nifty['close_2%'])

This method resets the cumulative sum for the first time it exceeds 2% , however after that it just copies close_2% values to cumsum.

Open,High,Low,Close,close_2%,cumsum ,new_group

4.883,4.883,4.883,4.883,NA,NA,NA

4.856,4.954,4.856,4.954,1.45%,1.45%,0
4.948,4.948,4.945,4.945,0.18%,0.18%,0
4.935,4.935,4.857,4.857,1.78%,3.42%,0
4.856,4.916,4.856,4.915,1.19%,1.19%,1 << This reset is achieved 
4.915,4.915,4.856,4.903,0.24%,1.44%(0.24%),1  <<<< values that get posted in cumsum after the first reset is given in () instead of 1.44%.
4.902,4.902,4.9,4.9,0.06%,1.50%(0.06%),1 <<<< same here values in () are posted by program instead of 1.50%
4.9,4.995,4.9,4.954,1.10%,2.60%(1.10%),1 <<<< same here values in () are posted by program instead of 2.60%
4.954,4.954,4.954,4.954,0%,0%,2 << This is not achieved instead now close_2% gets copied into cumsum (resets @ 2%)

The structure of your dataframe is confusing - will you please send a sample of it with `print(df.head().to_dict())`? — , Dec 23 '21 at 02:47
Does [this answer](https://stackoverflow.com/a/65412216/758174) help? — Pierre D, Dec 23 '21 at 02:54
The bottom-line is that, to my knowledge, there is no vectorized way to do cumulative operations with reset based on the cumulative value itself. But `numba` can help make your loop faster (if you accept an initial hit for compile time). — Pierre D, Dec 23 '21 at 03:01

Cummulative Sum with restart based on condition in Pandas

0 Answers0