0

I have a dataframe with two columns A and B, and would like to use rolling.apply function to take a decision based on both the values of A and B in each sliding window.

Here's a sample code:

import numpy as np
import pandas as pd

np.random.seed(101)
nb=200

df = pd.DataFrame(np.random.rand(200,2),
                  index=pd.date_range('2020-05-15', freq='50ms', periods=nb),columns=['A','B'])

Here's a toy function that uses mean to keep the example simple, but in reality I'm checking DTW on both A and B of each sliding window, and then return a decision.

def my_function(entry):

    if (entry['A'].mean() > entry['B'].mean()):
        return(1)
    else:
        return(0)

When trying the line code below, I'm getting:

"KeyError: 'A'" error when using "raw=True", and

"IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices:" when using "raw=False"

df['decision'] = df.rolling(window='4s',min_periods=80).apply(my_function, raw=False)

I had used this method (entry['A]) before with the pandas resample and it worked. Reading the pandas documentation I found that the rolling apply does not return a data frame, but instead it either returns a ndarray (raw=True) or a series (raw=False).

So I amended the function as follows to debug:

def my_function(entry):
    print(entry.shape)
    print(entry)
    return(99)

The problem with the above is that "my_function" is receiving the sliding windows of column A (one-by-one), and only then the sliding windows (one-by-one) of column B.

Therefore I cannot take a decision based on both columns A and B at each sliding window.

How can I resolve this?

jsammut
  • 305
  • 2
  • 8

1 Answers1

1

you can do the following:

import numpy as np
import pandas as pd

np.random.seed(101)

df = pd.DataFrame(np.random.rand(200,2),
                  index=pd.date_range('2020-05-15', freq='50ms', periods=200),columns=['A','B'])


df['decision'] = np.where(~df.rolling(window='4s',min_periods=80)['A'].mean().isna(),0,np.nan)
df['decision'] = np.where((df.rolling(window='4s',min_periods=80)['A'].mean()>df.rolling(window='4s',min_periods=80)['B'].mean()),1,df['decision'])
David
  • 871
  • 1
  • 5
  • 13