1

I have a dataset that I need to filter once a value has been exceeded but not after. Here is an example of the dataframe:

    Dip    MD
0   70      5000
1   80      6000
2   90      7000
3   80      8000

I want to filter out everything before Dip goes above 85 the first time so the resultant array should look like this:

    Dip     MD
0   90      7000
1   80      8000
jpp
  • 159,742
  • 34
  • 281
  • 339
aise0603
  • 11
  • 4

2 Answers2

1

Maybe using cummax

In [71]: df = pd.DataFrame({'Dip': [70, 80, 90, 80], 
    ...:     'MD': [5000, 6000, 7000, 8000]})         

In [72]: df[df.Dip.gt(85).cummax()]                   
Out[72]: 
   Dip    MD
2   90  7000
3   80  8000
lexual
  • 46,172
  • 2
  • 15
  • 14
BENY
  • 317,841
  • 20
  • 164
  • 234
0

You can first find the positional index of the first value satisfying a condition:

idx = next(iter(np.where(df['Dip'] > 85)[0]), df.shape[0])

Then slice your dataframe by integer position from this value onwards:

res = df.iloc[idx:]

Choosing df.shape[0] as the default if your condition is never satisfied ensures the entire dataframe is returned in this scenario.

Performance note

For larger data sets, you may find integer indexing more efficient than Boolean indexing:

np.random.seed(0)

df = pd.DataFrame({'A': np.random.randint(0, 100, 10**6)})

%timeit df[df['A'].gt(90).cummax()]                                   # 36.1 ms
%timeit df.iloc[next(iter(np.where(df['A'] > 90)[0]), df.shape[0]):]  # 4.04 ms

If efficiency is a primary concern, see Efficiently return the index of the first value satisfying condition in array. The idea is you don't have to traverse your entire series if the condition is satisfied earlier.

jpp
  • 159,742
  • 34
  • 281
  • 339