Fully reproducible dataframe and previous attempts at the end of the question. A similare question has been asked and answered for R
I've got a dataframe with a timestamp dates
and values a
that are integers in the range [-10, 10]
.
In the column named above
I've identified periods of time where a > 0
. Now I'd like to retrieve all beginnings and ends af such periods.
Data
dates a above pch per
0 2020-01-01 0 0 0.0 0.0
1 2020-01-02 -7 0 0.0 0.0
2 2020-01-03 -9 0 0.0 1.0
3 2020-01-06 1 1 1.0 0.0
4 2020-01-07 1 1 0.0 0.0
5 2020-01-08 2 1 0.0 0.0
6 2020-01-09 1 1 0.0 0.0
7 2020-01-10 1 1 0.0 -1.0
8 2020-01-13 -2 0 -1.0 1.0
9 2020-01-14 8 1 1.0 0.0
10 2020-01-15 10 1 0.0 0.0
I have been able to do so using an extremely cumbersome, and possibly unstable, way using for loops and identifying period changes in the column pch
. To make it arguably easier to index the rows in for loops, I've shifted the same column into a new column named per
. And running the snippet below will give med the desired output, which is:
[[Timestamp('2020-01-06 00:00:00'), Timestamp('2020-01-10 00:00:00')],
[Timestamp('2020-01-14 00:00:00'), Timestamp('2020-01-15 00:00:00')]]
But as you will see, the procedure is far from elegant. So it would be fantastic if any of you pandas pros would have any other suggestions on how to do this.
Complete code:
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'dates': {0: Timestamp('2020-01-01 00:00:00'),
1: Timestamp('2020-01-02 00:00:00'),
2: Timestamp('2020-01-03 00:00:00'),
3: Timestamp('2020-01-06 00:00:00'),
4: Timestamp('2020-01-07 00:00:00'),
5: Timestamp('2020-01-08 00:00:00'),
6: Timestamp('2020-01-09 00:00:00'),
7: Timestamp('2020-01-10 00:00:00'),
8: Timestamp('2020-01-13 00:00:00'),
9: Timestamp('2020-01-14 00:00:00'),
10: Timestamp('2020-01-15 00:00:00')},
'a': {0: 0, 1: -7, 2: -9, 3: 1, 4: 1, 5: 2, 6: 1, 7: 1, 8: -2, 9: 8, 10: 10},
'above': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 0, 9: 1, 10: 1},
'pch': {0: 0.0,
1: 0.0,
2: 0.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: -1.0,
9: 1.0,
10: 0.0},
'per': {0: 0.0,
1: 0.0,
2: 1.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: -1.0,
8: 1.0,
9: 0.0,
10: 0.0}})
# extract beriod starts and ends
# containers
p_s = []
p_e = []
# find beginning of periods where
# previous row for df['per'] = 1, and
# current row for df['per'] = -1
for i, p in enumerate(df['a'][1:], 1):
#print(df['a'].iat[i-1])
if df['per'].iat[i-1]==1:
#print(df['dates'].iat[i])
p_s.append(df['dates'].iat[i])
if df['per'].iat[i]==-1:
p_e.append(df['dates'].iat[i])
# every period should have a beginning and and end.
# so if there are more starts than ends, the last
# date available is appended to p_3
if len(p_e) < len(p_s):
p_e.append(df['dates'].iat[-1])
# transform a list of starts and a list of ends
# into as list of starts and ends
p_corrected = []
for i, p in enumerate(p_s):
#print(p_s[i])
new_elem = [p_s[i], p_e[i]]
p_corrected.append(new_elem)
print(p_corrected)