Find rows with duplicate ID, keep one with highest value in another column in R

Question

I have data with columns "ID" and "value" in which ID might be repeated. I would like to find all rows which have duplicate IDs and just keep the one with the higher value.

mydf <- data.frame(ID = c(1,2,2,3,4), value = c(5, 8, 20, 18,15))

I am working w dplyr. So far I can find the duplicates

find_dup <- function(dataset, var) {
  dataset %>% group_by({{var}}) %>% filter(n() >1) %>% ungroup %>% arrange({{var}})
}
find_dup(mydf, ID)

But am having trouble with the replace step, not sure how to "point to" the larger value. Hoping to stay with a tidyverse solution for now if possible. Any thoughts welcome, Thx!

Allan Cameron · Accepted Answer · 2023-08-07T23:25:49.017

Rather than specifically identifying and removing duplicates, you could group_by ID and slice_max the top value in each group.

library(dplyr)

mydf <- data.frame(ID = c(1, 2, 2, 3, 4), value = c(5, 8, 20, 18, 15))

mydf %>% 
  group_by(ID) %>% 
  slice_max(value, n = 1) %>%
  ungroup()
#> # A tibble: 4 x 2
#>      ID value
#>   <dbl> <dbl>
#> 1     1     5
#> 2     2    20
#> 3     3    18
#> 4     4    15

^{Created on 2023-08-07 with reprex v2.0.2}

Find rows with duplicate ID, keep one with highest value in another column in R

1 Answers1