I have a dataframe with rows that have duplicate rows and I want to drop those that have the lowest value possibly using dplyr
, I've tried the following and it removes some duplicate rows while others for some reason remain unfortunately.
Below is an example of what the DF looks like where lowest value to be removed should be based on col2
. In other words, duplicate rows with the highest values should be kept.
Current DataFrame
ID Col1 Col2
ABA 0.65 0.66
ABB 0.65 0.66
ABB 0.65 0.77
ABC 0.55 0.88
ABC 0.14 0.14
ABC 0.15 0.50
ABD 0.25 0.60
Desired DataFrame
ID Col1 Col2
ABA 0.65 0.66
ABB 0.65 0.77
ABC 0.55 0.88
ABD 0.25 0.60
Code Attempt
df %>% group_by(id) %>% top_n(0, Col2)
and
df <- df[order(df$id, df$Col2), ]
df <- df[ !duplicated(df$Col2), ]