Filter Values in Python of a Pandas Dataframe

Question

I have a dataset that I need to filter once a value has been exceeded but not after. Here is an example of the dataframe:

    Dip    MD
0   70      5000
1   80      6000
2   90      7000
3   80      8000

I want to filter out everything before Dip goes above 85 the first time so the resultant array should look like this:

    Dip     MD
0   90      7000
1   80      8000

score 1 · Answer 1 · edited Jan 24 '19 at 22:43

1

Maybe using cummax

In [71]: df = pd.DataFrame({'Dip': [70, 80, 90, 80], 
    ...:     'MD': [5000, 6000, 7000, 8000]})         

In [72]: df[df.Dip.gt(85).cummax()]                   
Out[72]: 
   Dip    MD
2   90  7000
3   80  8000

edited Jan 24 '19 at 22:43

lexual

46,172
2
15
14

answered Jan 24 '19 at 22:41

BENY

317,841
20
164
234

jpp · Answer 2 · 2019-01-24T23:01:11.570

You can first find the positional index of the first value satisfying a condition:

idx = next(iter(np.where(df['Dip'] > 85)[0]), df.shape[0])

Then slice your dataframe by integer position from this value onwards:

res = df.iloc[idx:]

Choosing df.shape[0] as the default if your condition is never satisfied ensures the entire dataframe is returned in this scenario.

Performance note

For larger data sets, you may find integer indexing more efficient than Boolean indexing:

np.random.seed(0)

df = pd.DataFrame({'A': np.random.randint(0, 100, 10**6)})

%timeit df[df['A'].gt(90).cummax()]                                   # 36.1 ms
%timeit df.iloc[next(iter(np.where(df['A'] > 90)[0]), df.shape[0]):]  # 4.04 ms

If efficiency is a primary concern, see Efficiently return the index of the first value satisfying condition in array. The idea is you don't have to traverse your entire series if the condition is satisfied earlier.

Filter Values in Python of a Pandas Dataframe

2 Answers2