Applying rolling function with second data frame

Question

Let's take two datasets:

import pandas as pd 
import numpy as np
df = pd.DataFrame([1, 2, 3, 2, 5, 4, 3, 6, 7])

check_df = pd.DataFrame([3, 2, 5, 4, 3, 6, 4, 2, 1])

I want to do the following thing:

If any of numbers df[0:3] is greater than check_df[0], then we return 1 and 0 otherwise
If any of numbers df[1:4] is greater than check_df[1] then we return 1 and 0 otherwise
And so on...

It can be done, by rolling function and custom function:

def custom_fun(x: pd.DataFrame):
    return (x > float(check_df.iloc[0])).any()

And then by combining this with apply function:

df.rolling(3, min_periods = 3).apply(custom_fun).shift(-2)

The main problem in my solution, is that I always compare with check_df[0], whereas in i-th rolling window, I should compare with check_df[i], but I have no idea how it can be specified in the rolling function. Could you please give me a hand in this problem?

IIUC, this was already solved here: https://stackoverflow.com/questions/73065778/compare-two-pandas-dataframes-in-the-most-efficient-way/73066990#73066990. You can just compare `check_df[i]` with the maximum of the rolling window of `df[i:i+3]` — ko3, Jul 22 '22 at 07:10

score 1 · Accepted Answer · answered Jul 22 '22 at 07:11

IIUC, you could use the first index of x, for example, with first_valid_index:

def custom_fun(x: pd.DataFrame):
    return (x > float(check_df.iloc[x.first_valid_index()])).any()


res = df.rolling(3, min_periods=3).apply(custom_fun).shift(-2)

print(res)

Output

As an alternative, use:

def custom_fun(x: pd.DataFrame):
    return (x > float(check_df.iloc[x.index[0]])).any()

Applying rolling function with second data frame

1 Answers1