How to filter a Pandas DataFrame to contain only unique values in one column and aggregate over other columns?

Question

I have a pandas DataFrame like below:

df = pd.DataFrame({"A": ["apple", "apple", "banana", "banana", "banana", "pineapple"],
                   "B": [0.5, 0.77, 0.32, 0.16, 0.05, 1],
                   "C": [132, 44, 32, 11, 0, 5]})

Now, I want to create a DataFrame from this in which I want to keep, for each unique value of column A, only the row with the highest value of column B and throw away the other rows. The desired result would look like this:

A            B        C
apple        0.77     44
banana       0.32     32
pineapple    1        5

Is there an elegant, Python efficient way of doing this? (The real DataFrame is quit big and has more extra columns besides C)

Does this answer your question? [Pandas aggregate count distinct](https://stackoverflow.com/questions/18554920/pandas-aggregate-count-distinct) — skowalak, May 19 '21 at 16:25
No it does not unfortunately. I have lots of columns that I want to keep the same, I just basically want to look at groups of unique values for column A, and keep only 1 row and throw away the others — Peter, May 19 '21 at 16:29

score 2 · Answer 1 · answered May 19 '21 at 16:29

2

Try groupby().idxmax():

df.loc[df.groupby('A')['B'].idxmax()]

Or drop_duplicates on sorted dataframe:

df.sort_values('B').drop_duplicates('A', keep='last')

answered May 19 '21 at 16:29

Quang Hoang

146,074
10
56
74

How to filter a Pandas DataFrame to contain only unique values in one column and aggregate over other columns?

1 Answers1