0

I have the following dataframe:

             pct_day    True_False
Date        
2018-01-02  NaN False
2018-01-03  0.006399    False
2018-01-04  0.004029    False
2018-01-05  0.007034    False
2018-01-08  0.001662    False
... ... ...
2020-01-23  0.001141    True
2020-01-24  -0.009042   True
2020-01-27  -0.015731   True
2020-01-28  0.010054    True
2020-01-29  -0.000867   False
522 rows × 2 columns

What i want to do is cumsum() on False and restart when it toggles to True. I want the cumulative sum to be only on consecutive True or False.

A similar question was asked here: How to groupby consecutive values in pandas DataFrame

for row in data3:
    if data3.True_False == data3.True_False.shift():
        print(data3.pct_day.cumsum())

But this produces an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can i get the iteration to continue only if the condition is true?

Slartibartfast
  • 1,058
  • 4
  • 26
  • 60

1 Answers1

1

You can do it this way.

a = df.True_False.ne(df.True_False.shift()).cumsum()
df['cum_val']= df.groupby(a)['pct_day'].cumsum()
print(df)

Input

        Date    pct_day     True_False
0   1/2/2018    NaN         False
1   1/3/2018    0.006399    False
2   1/4/2018    0.004029    False
3   1/5/2018    0.007034    False
4   1/8/2018    0.001662    False
5   1/23/2020   0.001141    True
6   1/24/2020   -0.009042   True
7   1/27/2020   -0.015731   True
8   1/28/2020   0.010054    True
9   1/29/2020   -0.000867   False

Output

        Date    pct_day True_False  cum_val
0   1/2/2018    NaN          False  NaN
1   1/3/2018    0.006399    False   0.006399
2   1/4/2018    0.004029    False   0.010428
3   1/5/2018    0.007034    False   0.017462
4   1/8/2018    0.001662    False   0.019124
5   1/23/2020   0.001141    True    0.001141
6   1/24/2020   -0.009042   True    -0.007901
7   1/27/2020   -0.015731   True    -0.023632
8   1/28/2020   0.010054    True    -0.013578
9   1/29/2020   -0.000867   False   -0.000867
moys
  • 7,747
  • 2
  • 11
  • 42
  • It worked! I have a followup question, `a =df.True_False.ne(df.True_False.shift()).cumsum()`. Curious as to why did you use `ne`. Since does it not mean as long as `True` is not equal to `True` then do cumsum()? – Slartibartfast Jan 31 '20 at 04:01
  • 2
    In this case what it means is that `df.True_False` is `ne(not equal)` to `df.True_False.shift()`. Basically value in each row is not equal to value in the next row. `ne` is same as `!=` but the syntax is a little diferent, if you want o use `!=` you should do it like this --> `(df.True_False!= df.True_False.shift()).cumsum()` – moys Jan 31 '20 at 04:07