I am trying to find cumulative sum of a column in pandas based on a reset condition of 2%. That is as and when the cumulative sum exceeds 2% , I reset my cumulative sum to 0 and restart the process of calculating the cumulative sum.
The dataframe I am working with looks like this
nifty
Open,High,Low,Close,close_2%,cumsum (resets @ 2%), new_group
4.883,4.883,4.883,4.883,NA,NA,NA
4.856,4.954,4.856,4.954,1.45%,1.45%,0
4.948,4.948,4.945,4.945,0.18%,0.18%,0
4.935,4.935,4.857,4.857,1.78%,3.42%,0
4.856,4.916,4.856,4.915,1.19%,1.19%,1 << data got reset because cumulative sum exceeded 2%.
4.915,4.915,4.856,4.903,0.24%,1.44%,1
4.902,4.902,4.9,4.9,0.06%,1.50%,1
4.9,4.995,4.9,4.954,1.10%,2.60%,1
4.954,4.954,4.954,4.954,0%,0%,2 << data got reset because cumulative sum exceeded 2%.
df.head() looks like so
Open High Low Close close_2% cumsum (resets @ 2%) new_group
0 4.883 4.883 4.883 4.883 NaN NaN NaN
1 4.856 4.954 4.856 4.954 1.45% 1.45% 0.0
2 4.948 4.948 4.945 4.945 0.18% 0.18% 0.0
3 4.935 4.935 4.857 4.857 1.78% 3.42% 0.0
Solutions I have tried and did not work are
nifty['cumsum']=0
total = 0
for i, v in nifty.iterrows():
total += nifty['close_2%'][-1]
nifty['cumsum'].iloc[-1] = total
if total > 2.0:
print('total',total)
total = 0
2nd method
nifty['close_2%']=np.abs(nifty['Close'].pct_change()*100)
nifty['var']=nifty['close_2%'].cumsum() # where 'var' is a temp variable.
nifty['cumsum']=np.where(nifty['var']<2,nifty['var'],nifty['close_2%'])
This method resets the cumulative sum for the first time it exceeds 2% , however after that it just copies close_2% values to cumsum.
Open,High,Low,Close,close_2%,cumsum ,new_group
4.883,4.883,4.883,4.883,NA,NA,NA
4.856,4.954,4.856,4.954,1.45%,1.45%,0
4.948,4.948,4.945,4.945,0.18%,0.18%,0
4.935,4.935,4.857,4.857,1.78%,3.42%,0
4.856,4.916,4.856,4.915,1.19%,1.19%,1 << This reset is achieved
4.915,4.915,4.856,4.903,0.24%,1.44%(0.24%),1 <<<< values that get posted in cumsum after the first reset is given in () instead of 1.44%.
4.902,4.902,4.9,4.9,0.06%,1.50%(0.06%),1 <<<< same here values in () are posted by program instead of 1.50%
4.9,4.995,4.9,4.954,1.10%,2.60%(1.10%),1 <<<< same here values in () are posted by program instead of 2.60%
4.954,4.954,4.954,4.954,0%,0%,2 << This is not achieved instead now close_2% gets copied into cumsum (resets @ 2%)