I have a DataFrame like this:
timestamp variance
2017-07-10 20:42:42 0
2017-07-10 20:42:42 1
2017-07-10 20:42:42 2
2017-07-10 20:42:43 6
2017-07-10 20:42:43 7
2017-07-10 20:42:43 9
2017-07-10 20:42:43 3
2017-07-10 20:42:43 4
2017-07-10 20:42:43 5
2017-07-10 20:42:43 1
2017-07-10 20:42:43 4
2017-07-10 20:42:43 1
2017-07-10 20:42:43 3
2017-07-10 20:42:43 7
2017-07-10 20:42:43 9
I would like to add a new column that increments for each row in which variance is equal or greater than 5. When values drop below 5, the count should decrement instead. If the value reaches 0, it should stay at 0.
This is what it should look like:
timestamp variance cumvar
2017-07-10 20:42:42 0 0
2017-07-10 20:42:42 1 0
2017-07-10 20:42:42 2 0
2017-07-10 20:42:43 6 1
2017-07-10 20:42:43 7 2
2017-07-10 20:42:43 9 3
2017-07-10 20:42:43 3 2
2017-07-10 20:42:43 4 1
2017-07-10 20:42:43 5 2
2017-07-10 20:42:43 1 1
2017-07-10 20:42:43 4 0
2017-07-10 20:42:43 1 0
2017-07-10 20:42:43 3 0
2017-07-10 20:42:43 7 1
2017-07-10 20:42:43 9 2
The closest I've come to doing this is this:
df['cumvar'] = np.where((df['variance'] > 5), 1, -1).cumsum()
But of course, this doesn't apply a minimum value of 0 to the cumulative sum. How can I adapt this to achieve the above?