1

Let's consider pandas frames:

df = pd.DataFrame([1, 2, 3, 2, 5, 4, 3, 7, 2])
df_top = pd.DataFrame([1, 2, 4, 5, 2, 3, 4, 5, 1])
label_frame = pd.DataFrame([0, 0, 0, 0, 0, 0, 0, 0, 0])

I want to do the following thing:

If any of numbers df.iloc[0:3] is greater than df_top.iloc[0], then we assign to first element of label_frame minimal index for which this is satisfied.

For the first iteration it should look like this:

My program checks: df.iloc[0] > df_top.iloc[0] False, df.iloc[1] > df_top.iloc[0] True df.iloc[2] > df_top.iloc[0] True, so it should replace first element of label_frame with 1 since its the minimal index for which this inequality is satisfied.

I want to iterate this programme for whole data frame df using .rolling function combined with .apply. (so the second example should be df[1:4] > df_top[1], and we replace second element of label_frame).

Do you know how it can be done? I tried to play with a custom function, with lambda, but I have no idea how can I have rolling window of df and return minimal value of index for which the inequality is satisfied.

for i in range(len(label_frame) - 3):
    if (df.iloc[i:i+3] > df_top.iloc[i]).any()[0]:
        label_frame.iloc[i] = np.where(df.iloc[i:i+3] > df_top.iloc[i])[0].min()
label_frame.iloc[-2:, 0] = np.nan
label_frame

    0
0   1.0
1   1.0
2   2.0
3   0.0
4   0.0
5   0.0
6   0.0
7   NaN
8   NaN
Lucian
  • 351
  • 2
  • 10

1 Answers1

0

IIUC, and if you only want to test 3 values, the easiest might be to use a 2D comparison:

a = df.assign(**{'1': df[0].shift(-1), '2': df[0].shift(-2)}).eq(df_top).to_numpy()
m = a.any(1)
label_frame[0] = df.index + np.where(m, a.argmax(1), np.nan)

output:

     0
0  0.0
1  1.0
2  NaN
3  NaN
4  NaN
5  NaN
6  NaN
7  NaN
8  NaN
mozway
  • 194,879
  • 13
  • 39
  • 75
  • I updated my question by adding an exemplary desired output. Could you please chceck it if we are on the same page? – Lucian Jul 21 '22 at 15:02