How do I find duplicate rows and remove all but one based on the value of one column?

Question

I have a dataframe called df that looks like this

c1 c2
A  1
A  2
A  3
B  1

I want column to find all rows where c1 has duplicate values, and keep only the row with the highest c2 value.

The result would look like this:

If your data is sorted by `c2`, then `df[!duplicated(df$C1, fromLast = T), ]`. — Gregor Thomas, Feb 17 '17 at 23:00

score 0 · Answer 1 · answered Feb 17 '17 at 22:56

Since you only have the two columns in your data frame, the function aggregate can do this pretty easily:

aggregate(c2 ~ c1, data = df, FUN = max)

aggregate executes the function FUN on data based on the groups described on the right half of the input formula.

1 Answers1