0

I have a data frame that has repeating values in 2 columns and I only want to keep the highest value of each combination. For the following data frame:

df = pd.DataFrame(
np.array([['A', 'B ', 3], ['A', 'B', 6], ['C', 'D', 9],  ['C', 'D', 2], ['C', 'B', 4]]))
df

how would I get this dataframe as a result:

|A|B|6|
|C|D|9|
|C|B|4|

Here's my code:

df = df.groupby([0]).max().sort_values(2,ascending=False)
df

and this is what it returns:

|A|B|6|
|C|D|9|

My problem with my code is that it only sorts the values on the first column (so CB is the same as CB, but I want 2 separate values returned). I only want to keep the highest row for all combinations. Some posts very similar but different is this one. Can someone please let me know what I can do to fix this problem? Thanks!

UserX
  • 105
  • 1
  • 10
  • related: https://stackoverflow.com/questions/61721971/how-to-find-highest-combination-in-dataframe – anky May 13 '20 at 17:33

2 Answers2

2

You can do:

df.sort_values(2).drop_duplicates([0,1], keep='last')

Output:

   0  1  2
4  C  B  4
1  A  B  6
2  C  D  9
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
2

Like this:

In [131]: df.groupby([0, 1], as_index=False, sort=False).max()                                                                                                                                              
Out[131]: 
   0  1  2
0  A  B  6
1  C  D  9
2  C  B  4
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58