I have a dataframe with two columns A and B, and would like to use rolling.apply function to take a decision based on both the values of A and B in each sliding window.
Here's a sample code:
import numpy as np
import pandas as pd
np.random.seed(101)
nb=200
df = pd.DataFrame(np.random.rand(200,2),
index=pd.date_range('2020-05-15', freq='50ms', periods=nb),columns=['A','B'])
Here's a toy function that uses mean to keep the example simple, but in reality I'm checking DTW on both A and B of each sliding window, and then return a decision.
def my_function(entry):
if (entry['A'].mean() > entry['B'].mean()):
return(1)
else:
return(0)
When trying the line code below, I'm getting:
"KeyError: 'A'" error when using "raw=True", and
"IndexError: only integers, slices (:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices:" when using "raw=False"
df['decision'] = df.rolling(window='4s',min_periods=80).apply(my_function, raw=False)
I had used this method (entry['A]) before with the pandas resample and it worked. Reading the pandas documentation I found that the rolling apply does not return a data frame, but instead it either returns a ndarray (raw=True) or a series (raw=False).
So I amended the function as follows to debug:
def my_function(entry):
print(entry.shape)
print(entry)
return(99)
The problem with the above is that "my_function" is receiving the sliding windows of column A (one-by-one), and only then the sliding windows (one-by-one) of column B.
Therefore I cannot take a decision based on both columns A and B at each sliding window.
How can I resolve this?