Filter a Pandas dataframe by a condition and a minimum value in a column

Question

I have a dataframe similar to

  test_a test_b  metric_e
0     OK    NOK        12
1     OK     OK         7
2     OK    NOK         2
3     OK     OK        55

and I want to filter by one condition, meaning that test_a == OK and capture the minimum value on metric_e. I can accomplish that with two lines, copying a dataframe:

df_t = df[df.test_a == 'OK'].reset_index(drop=True)
df_t.iloc[df_t.metric_e.idxmin()].to_frame()

test_a | test_b | metric_e
OK     |  NOK   | 2

Is there a way to do it without having to use an intermediate dataframe?

If you want to capture the index, df2[(df2.test_a == 'OK') & (df2.metric_e == df2.metric_e.min())] — Vaishali, Mar 29 '19 at 20:30
I need to get all values in the row in the dataframe that matches condition on test_a and is the minimum value in metric_e on this subset — Ivan, Mar 29 '19 at 20:31
@Vaishali - no,because need filter in already filtered data,so cannot be used here — jezrael, Mar 29 '19 at 20:33
It's similar but probably not a dupe, as the line that you mention will cause an error if the minimum value is filtered out by condition 1. — Ivan, Mar 29 '19 at 20:33
Well, not right now. I think I would be more concerned about memory use than runtime in this specific case, but my dataset is quite small right now. — Ivan, Mar 29 '19 at 20:41

score 8 · Answer 1 · answered Mar 29 '19 at 20:35

8

Using nsmallest:

df[df['test_a']=='OK'].nsmallest(1, 'metric_e')

Output:

  test_a test_b  metric_e
2     OK    NOK         2

answered Mar 29 '19 at 20:35

perl

9,826
1
10
22

2

Most concise answer imo +1 – Erfan Mar 29 '19 at 20:37
1

If your question is about how to use it, you don't have to have the values sorted for using `nsmallest`. And if you're asking about how it's implemented in pandas, I just looked it up and it looks very interesting, they're apparently not sorting all the values but using some sort of selection algorithm. Their comment says `This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant.` (see https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py, search for `nsmallest`) – perl Mar 29 '19 at 21:03

score 4 · Accepted Answer · answered Mar 29 '19 at 20:35

In my opinion your solution is nice, also is possible join both rows of code together with double [] for return one row DataFrame:

df = df.loc[[df.loc[df.test_a == 'OK', 'metric_e'].idxmin()]]
print (df)
  test_a test_b  metric_e
2     OK    NOK         2

score 1 · Answer 3 · answered Mar 29 '19 at 20:31

With the output from your code, you can try with:

df[df.metric_e==df.loc[df.test_a.eq('OK'),'metric_e'].min()].T

            2
test_a     OK
test_b    NOK
metric_e    2

If don't want transpose:

df[df.metric_e==df.loc[df.test_a.eq('OK'),'metric_e'].min()]

  test_a test_b  metric_e
2     OK    NOK         2

score 0 · Answer 4 · answered Mar 29 '19 at 20:42

0

Slice after sort_values

df.query("test_a=='OK'").sort_values('metric_e').iloc[[0]]# or head(1)
Out[658]: 
  test_a test_b  metric_e
2     OK    NOK         2

answered Mar 29 '19 at 20:42

BENY

317,841
20
164
234

Filter a Pandas dataframe by a condition and a minimum value in a column

4 Answers4

Linked