0

I have a large df and I am trying find all rows where the value in a specific column is above a given number but within a window of say 3 rows and returning only the rows with the highest value over the given number.

A    B    C    D    E
1    5    9    10   15
2    4    7    12   16
3    3    5    10   18
4    2    3    15   17
5    1    1    10   14
6    5    9    17   13
7    4    7    10   14
8    3    5    19   19
9    2    3    10   18
10   4    7    5   14
11   3    5    6   19
12   2    3    7   18

If I wanted to do this with the above example for column D, where the value must be above 11, the output would be.

A    B    C    D    E
2    4    7    12   16
6    5    9    17   13
8    3    5    19   19.

What would be the best way to go about this?

I've tried: df.rolling(3,win_type=None, on='D') but can't find a way to include the greater than condition.

Any help is appreciatted. Thanks!

Mark
  • 934
  • 1
  • 10
  • 25
Iceberg_Slim
  • 422
  • 6
  • 16
  • Does this answer your question? [how do I calculate a rolling idxmax](https://stackoverflow.com/questions/40101130/how-do-i-calculate-a-rolling-idxmax) – Chris Adams Dec 17 '19 at 09:30
  • Not quite, I don't want to return the max in each window as sometimes the values will be below the minimum value I want to set. – Iceberg_Slim Dec 17 '19 at 09:39
  • I think any pandas method will not help, you should go for simple for loop parsing. If you want it then I can help. – Vinod Sawant Dec 17 '19 at 10:14

1 Answers1

2

Edited: Try this:

threshold = 11
window = 3
df['r'] = np.floor(df.index / window)
print(df.groupby('r').apply(lambda x : (x.loc[x['D'] == x['D'].max() ,:]) if x['D'].max() > threshold else None))

You can drop column 'r' after usage.

Output:

      A  B  C   D   E    r
r                          
0.0 1  2  4  7  12  16  0.0
1.0 5  6  5  9  17  13  1.0
2.0 7  8  3  5  19  19  2.0
Divya Dass
  • 166
  • 1
  • 10