I have a time series data that looks like this:
date values
2017-05-01 1
2017-05-02 0.5
2017-05-03 -2
2017-05-04 -1
2017-05-05 -1.25
2017-05-06 0.5
2017-05-07 0.5
I would like to add a field that computes the cumulative sum of my time series by trend: sum of consecutive positive values, sum of consecutive negative values. Something that looks like this:
date values newfield
2017-05-01 1 1 |
2017-05-02 0.5 1.5 |
2017-05-03 -2 -2 |
2017-05-04 -1 -3 |
2017-05-05 -1.25 -4.25 |
2017-05-06 0.5 0.5 |
2017-05-07 0.5 1 |
At the moment, I'm trying to use shift and then having conditions but this is really not efficient and I am realizing it is really not a good approach.
def pn(x, y):
if x < 0 and y < 0:
return 1
if x > 0 and y > 0:
return 1
else:
return 0
def consum(x,y,z):
if z == 0:
return x
if y == 1:
return x+y
test = pd.read_csv("./test.csv", sep=";")
test['temp'] = test.Value.shift(1)
test['temp2'] = test.apply(lambda row: pn(row['Value'], row['temp']), axis=1)
test['temp3'] = test.apply(lambda row: consum(row['Value'], row['temp'], row['temp2']), axis=1)
Date Value temp temp2 temp3
2017-05-01 1 nan 0 1
2017-05-02 0.5 1 1 1.5
2017-05-03 -2 0 0 -2
2017-05-04 -1 -2 1 nan
2017-05-05 -1.25 -1 1 nan
2017-05-06 0.5 -1.25 0 0.5
2017-05-07 0.5 0.5 1 nan
After that I'm lost. I could continue to shift my values and have lots of if statements but there must be a better way.