-1

I have a pandas DataFrame like below:

df = pd.DataFrame({"A": ["apple", "apple", "banana", "banana", "banana", "pineapple"],
                   "B": [0.5, 0.77, 0.32, 0.16, 0.05, 1],
                   "C": [132, 44, 32, 11, 0, 5]})

Now, I want to create a DataFrame from this in which I want to keep, for each unique value of column A, only the row with the highest value of column B and throw away the other rows. The desired result would look like this:

A            B        C
apple        0.77     44
banana       0.32     32
pineapple    1        5

Is there an elegant, Python efficient way of doing this? (The real DataFrame is quit big and has more extra columns besides C)

Peter
  • 722
  • 6
  • 24
  • Does this answer your question? [Pandas aggregate count distinct](https://stackoverflow.com/questions/18554920/pandas-aggregate-count-distinct) – skowalak May 19 '21 at 16:25
  • 1
    `df.sort_values('B').drop_duplicates('A',keep='last')` – anky May 19 '21 at 16:28
  • No it does not unfortunately. I have lots of columns that I want to keep the same, I just basically want to look at groups of unique values for column A, and keep only 1 row and throw away the others – Peter May 19 '21 at 16:29

1 Answers1

2

Try groupby().idxmax():

df.loc[df.groupby('A')['B'].idxmax()]

Or drop_duplicates on sorted dataframe:

df.sort_values('B').drop_duplicates('A', keep='last')
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74