Your error stems from assuming what comes to the function inside apply is a dataframe, it is actually a ndarray not a dataframe.
Pandas dataframe apply works on each column/series of the dataframe, so any function passed to apply is applied along each column/series like an internal lambda. In case of windowed dataframe, apply takes each column/series inside the each window and passes to the function as ndarray and the function has to return only array of length 1 per one series per one window. Knowing this saves a lot of pain.
so in your case you cannot use any apply unless you have a complex function that remembers first value of the series a
for each window.
For OP's case if a column of the window say a
is meeting a condition, say > 10
For case where a
in the first row of a window meets condition it is same as searching in dataframe df[df['a']>10]
.
For other conditions like a
in second row of a window is > 10
, checking the entire dataframe works except for the first window of the dataframe.
Following example demonstrates another way to solution.
import numpy as np
import pandas as pd
np.random.seed(123)
df = pd.DataFrame(np.random.randint(0,20,size=(20, 4)), columns=list('abcd'))
df
looks like
a b b d
0 13 2 2 6
1 17 19 10 1
2 0 17 15 9
3 0 14 0 15
4 19 14 4 0
5 16 4 17 3
6 2 7 2 15
7 16 7 9 3
8 6 1 2 1
9 12 8 3 10
10 5 0 11 2
11 10 13 18 4
12 15 11 12 6
13 13 19 16 6
14 14 7 11 7
15 1 11 5 18
16 17 12 18 17
17 1 19 12 9
18 16 17 3 3
19 11 7 9 2
now to select a window if second row inside rolling window of a meets a condition a > 10
like in OP's question.
roll_window=5
search_index=1
df_roll = df['a'].rolling(roll_window)
df_y = df_roll.apply(lambda x:x[1] if x[1] > 10 else np.nan).dropna()
above line returns all values of a
corresponding to condition a
in second row of a window greater then 10. Note the values are right based on example dataframe above but the indexes are defined by how rolling window was centered.
4 17.0
7 19.0
8 16.0
10 16.0
12 12.0
15 15.0
16 13.0
17 14.0
19 17.0
to get the right index location and entire row inside the first dataframe
df.loc[df_y.index+searchindex-rollwindow+1]
returns
a b b d
1 17 19 10 1
4 19 14 4 0
5 16 4 17 3
7 16 7 9 3
9 12 8 3 10
12 15 11 12 6
13 13 19 16 6
14 14 7 11 7
16 17 12 18 17
one could also use np.array(df)
and make a rolling slice corresponding to rolling window and filter the array using slices correspondingly.