4

I have a dataframe similar to

  test_a test_b  metric_e
0     OK    NOK        12
1     OK     OK         7
2     OK    NOK         2
3     OK     OK        55

and I want to filter by one condition, meaning that test_a == OK and capture the minimum value on metric_e. I can accomplish that with two lines, copying a dataframe:

df_t = df[df.test_a == 'OK'].reset_index(drop=True)
df_t.iloc[df_t.metric_e.idxmin()].to_frame()

test_a | test_b | metric_e
OK     |  NOK   | 2

Is there a way to do it without having to use an intermediate dataframe?

Erfan
  • 40,971
  • 8
  • 66
  • 78
Ivan
  • 19,560
  • 31
  • 97
  • 141
  • Are you sure your output is right from your provided code? – user3483203 Mar 29 '19 at 20:28
  • If you want to capture the index, df2[(df2.test_a == 'OK') & (df2.metric_e == df2.metric_e.min())] – Vaishali Mar 29 '19 at 20:30
  • I need to get all values in the row in the dataframe that matches condition on test_a and is the minimum value in metric_e on this subset – Ivan Mar 29 '19 at 20:31
  • 1
    @Vaishali - no,because need filter in already filtered data,so cannot be used here – jezrael Mar 29 '19 at 20:33
  • It's similar but probably not a dupe, as the line that you mention will cause an error if the minimum value is filtered out by condition 1. – Ivan Mar 29 '19 at 20:33
  • Well, not right now. I think I would be more concerned about memory use than runtime in this specific case, but my dataset is quite small right now. – Ivan Mar 29 '19 at 20:41

4 Answers4

8

Using nsmallest:

df[df['test_a']=='OK'].nsmallest(1, 'metric_e')

Output:

  test_a test_b  metric_e
2     OK    NOK         2
perl
  • 9,826
  • 1
  • 10
  • 22
  • 2
    Most concise answer imo +1 – Erfan Mar 29 '19 at 20:37
  • 1
    If your question is about how to use it, you don't have to have the values sorted for using `nsmallest`. And if you're asking about how it's implemented in pandas, I just looked it up and it looks very interesting, they're apparently not sorting all the values but using some sort of selection algorithm. Their comment says `This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant.` (see https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py, search for `nsmallest`) – perl Mar 29 '19 at 21:03
4

In my opinion your solution is nice, also is possible join both rows of code together with double [] for return one row DataFrame:

df = df.loc[[df.loc[df.test_a == 'OK', 'metric_e'].idxmin()]]
print (df)
  test_a test_b  metric_e
2     OK    NOK         2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

With the output from your code, you can try with:

df[df.metric_e==df.loc[df.test_a.eq('OK'),'metric_e'].min()].T

            2
test_a     OK
test_b    NOK
metric_e    2

If don't want transpose:

df[df.metric_e==df.loc[df.test_a.eq('OK'),'metric_e'].min()]

  test_a test_b  metric_e
2     OK    NOK         2
anky
  • 74,114
  • 11
  • 41
  • 70
0

Slice after sort_values

df.query("test_a=='OK'").sort_values('metric_e').iloc[[0]]# or head(1)
Out[658]: 
  test_a test_b  metric_e
2     OK    NOK         2
BENY
  • 317,841
  • 20
  • 164
  • 234