Let's say I have two columns, 'a' and 'b', of time series data in a Pandas dataframe. I want to create a third column that indicates if the difference between column 'a' at the current time period and column 'b' at any of the next 5 time periods increases by 8 or more before it decreases by 2 or more. Ideally I'd do this using some form of df.rolling(5).apply() and without any loops, but I keep running into challenges.
For demonstration's sake, I've written the logic out with loops but if anyone can give me some direction on how to do this more efficiently or elegantly, I'd really appreciate it. In reality the dataframe and the window will be much larger.
df = pd.DataFrame({'a':[1,2,3,4,5,6,7,8,9,10], 'b':[1,0,9,0,15,0,20,15,23,6]})
df['c'] = 0
window = 5
positive_thresh = 8
negative_thresh = -2
num_rows = df.shape[0]
for a_idx in range(num_rows):
a_start = df.iloc[a_idx,0]
b_roll = df.iloc[(a_idx + 1):max(a_idx + 1 + window,num_rows), 1]
deltas = b_roll - a_start
positives = deltas[deltas>=positive_thresh]
negatives = deltas[deltas<=negative_thresh]
first_pos_idx = positives.index[0] if len(positives) > 0 else num_rows
first_neg_idx = negatives.index[0] if len(negatives) > 0 else num_rows
if first_pos_idx < first_neg_idx:
df.iloc[a_idx,2] = 1
print(df)
a b c
0 1 1 1
1 2 0 0
2 3 9 0
3 4 0 1
4 5 15 0
5 6 0 1
6 7 20 1
7 8 15 1
8 9 23 0
9 10 6 0