1

I have a subset data that has a total count for each observation from a bigger dataset. If I want to drop duplicates based on a higher count and drop codes that appear less if the name is the same, how would I go about that? So for instance:

name = c("a", "a", "b", "b", "b", "c", "d", "e", "e", "e")
code = c(1,1,2,3,4,1,1,2,2,3)
n = c(1,10,2,3,5,4,8,100,90,40)
data = data.frame(name,code,n)

The end product would be left with these:

name = c("a", "b", "c", "d", "e")
code = c(1,4,1,1,2)
n = c(10,5,4,8,100)
data2 = data.frame(name,code,n)
Sun
  • 157
  • 11
  • 1
    Side note: do not do `data.frame(cbind(...))`. You've turned all your numeric variables into characters. The function `data.frame()` is all you need: `data.frame(name,code,n)`. – joran Nov 14 '18 at 19:34
  • 1
    @joran Thank you. will change that now – Sun Nov 14 '18 at 19:42
  • 2
    Possible duplicate of [Remove duplicates keeping entry with largest absolute value](https://stackoverflow.com/questions/12805964/remove-duplicates-keeping-entry-with-largest-absolute-value) – Daniel Fischer Nov 14 '18 at 19:56

1 Answers1

1

If you can use dplyr, this should do the trick:

library(dplyr)
data %>%
  group_by(name) %>%
  filter(n == max(n)) %>%
  ungroup()
dmca
  • 675
  • 1
  • 8
  • 18