Return unique row values from a pandas dataframe based on some conditions

Question

i have this problem i have been trying to solve myself but stuck in between, so i brought it here to seek your help and i look forward to it.

I have a pandas dataframe as below:

            x1          y1          x2          y2    confidence   class
0   238.288834  118.716125  300.878754  137.672791    0.885205    0.0
1   238.288834  118.716125  300.878754  137.672791    0.881469    1.0
2   238.288834  118.716125  300.878754  137.672791    0.879645    5.0
3   248.977844  115.054123  321.307007  141.315460    0.876451    0.0
4   248.977844  115.054123  321.307007  141.315460    0.872008    1.0
5  15.0           10.0     2298.9      187.0         0.70      0.0

I would like to return rows which has a unique x1,y1,x2 and y2 values with the highest confidence value.

Explanation: From the dataframe above row 0,1 and 2 has the same x1,y1,x2 and y2 values but i would like to return row 0 since it is the unique one which has the highest value of the confidence interval which is (0.885205)

The expected outcome would look like:

            x1          y1          x2          y2    confidence   class
0   238.288834  118.716125  300.878754  137.672791    0.885205    0.0
1   248.977844  115.054123  321.307007  141.315460    0.876451    0.0
2.  15.0           10.0     2298.9      187.0.         0.70      0.0

SeaBean · Answer 1 · 2021-10-06T12:41:29.273

1

You can try:

df.groupby(['x1','y1','x2', "y2"], as_index=False, sort=False)['confidence'].max()

Result:

           x1          y1           x2          y2  confidence
0  238.288834  118.716125   300.878754  137.672791    0.885205
1  248.977844  115.054123   321.307007  141.315460    0.876451
2   15.000000   10.000000  2298.900000  187.000000    0.700000

Or, if you wan to show all columns, use idxmax() + .loc

df.loc[df.groupby(['x1','y1','x2', "y2"], sort=False)['confidence'].idxmax()]

Result:

           x1          y1           x2          y2  confidence  class
0  238.288834  118.716125   300.878754  137.672791    0.885205    0.0
3  248.977844  115.054123   321.307007  141.315460    0.876451    0.0
5   15.000000   10.000000  2298.900000  187.000000    0.700000    0.0

edited Oct 06 '21 at 12:41

answered Oct 06 '21 at 12:25

SeaBean

22,547
3
13
25

It is dupe, can you convert to wiki? – jezrael Oct 06 '21 at 12:27
1

@jezrael Ok, converted to wiki – SeaBean Oct 06 '21 at 12:29
1

@jezrael Seems like `idxmax()` + `.loc` would be better match requirement. Edited the wiki answer though. – SeaBean Oct 06 '21 at 12:37

Return unique row values from a pandas dataframe based on some conditions

1 Answers1