Cumulative sum on time series split by consecutive negative or positive values

Question

I have a time series data that looks like this:

date        values
2017-05-01      1
2017-05-02      0.5
2017-05-03     -2
2017-05-04     -1
2017-05-05     -1.25
2017-05-06      0.5
2017-05-07      0.5

I would like to add a field that computes the cumulative sum of my time series by trend: sum of consecutive positive values, sum of consecutive negative values. Something that looks like this:

date        values   newfield
2017-05-01      1         1      |
2017-05-02      0.5       1.5    |
2017-05-03     -2        -2    |
2017-05-04     -1        -3    |
2017-05-05     -1.25     -4.25 |
2017-05-06      0.5       0.5    |
2017-05-07      0.5       1      |

At the moment, I'm trying to use shift and then having conditions but this is really not efficient and I am realizing it is really not a good approach.

def pn(x, y):
if x < 0 and y < 0:
    return 1
if x > 0 and y > 0:
    return 1
else:
    return 0 

def consum(x,y,z):
if z == 0:
    return x
if y == 1:
    return x+y

test = pd.read_csv("./test.csv", sep=";")
test['temp'] = test.Value.shift(1)
test['temp2'] = test.apply(lambda row: pn(row['Value'], row['temp']), axis=1)
test['temp3'] = test.apply(lambda row: consum(row['Value'], row['temp'], row['temp2']), axis=1)

    Date        Value     temp  temp2   temp3
    2017-05-01   1       nan    0       1
    2017-05-02   0.5     1      1       1.5
    2017-05-03  -2       0      0      -2
    2017-05-04  -1      -2      1       nan
    2017-05-05  -1.25   -1      1       nan
    2017-05-06   0.5    -1.25   0       0.5
    2017-05-07   0.5     0.5    1       nan

After that I'm lost. I could continue to shift my values and have lots of if statements but there must be a better way.

I'm voting to close this question as off-topic because SO is not a code writing service. You must show the code you've tried, and your question needs to specifically state where exactly you're having trouble. — stevieb, Jul 03 '17 at 17:16
I have tried using shift, adding a conditional field with positive-negative. I just can't figure out the part that groups by consecutive values without removing losing my daily detail. — nickfrenchy, Jul 03 '17 at 17:19

score 5 · Accepted Answer · answered Jul 03 '17 at 17:57

Putting 0 in with the positives, you can use the shift-compare-cumsum pattern:

In [33]: sign = df["values"] >= 0

In [34]: df["vsum"] = df["values"].groupby((sign != sign.shift()).cumsum()).cumsum()

In [35]: df
Out[35]: 
         date  values  vsum
0  2017-05-01    1.00  1.00
1  2017-05-02    0.50  1.50
2  2017-05-03   -2.00 -2.00
3  2017-05-04   -1.00 -3.00
4  2017-05-05   -1.25 -4.25
5  2017-05-06    0.50  0.50
6  2017-05-07    0.50  1.00

which works because (sign != sign.shift()).cumsum() gives us a new number for each contiguous group:

In [36]: sign != sign.shift()
Out[36]: 
0     True
1    False
2     True
3    False
4    False
5     True
6    False
Name: values, dtype: bool

In [37]: (sign != sign.shift()).cumsum()
Out[37]: 
0    1
1    1
2    2
3    2
4    2
5    3
6    3
Name: values, dtype: int64

score 3 · Answer 2 · answered Jul 03 '17 at 18:21

Create a groups:

g = np.sign(df['values']).diff().ne(0).cumsum()
g

Output:

0    1
1    1
2    2
3    2
4    2
5    3
6    3
Name: values, dtype: int64

Now, use g as a groupby with cumsum

df.groupby(g).cumsum()

Output:

   values
0    1.00
1    1.50
2   -2.00
3   -3.00
4   -4.25
5    0.50
6    1.00

Cumulative sum on time series split by consecutive negative or positive values

2 Answers2

Linked