I have a dataframe called df
that looks like this
c1 c2
A 1
A 2
A 3
B 1
I want column to find all rows where c1
has duplicate values, and keep only the row with the highest c2
value.
The result would look like this:
I have a dataframe called df
that looks like this
c1 c2
A 1
A 2
A 3
B 1
I want column to find all rows where c1
has duplicate values, and keep only the row with the highest c2
value.
The result would look like this:
Since you only have the two columns in your data frame, the function aggregate
can do this pretty easily:
aggregate(c2 ~ c1, data = df, FUN = max)
aggregate
executes the function FUN
on data
based on the groups described on the right half of the input formula.