0

Having a dataframe:

dframe <- data.frame(id = c(1,2,3,2,2), name = c("Google","Google","Google","Amazon","Google"))

How is it possible to check both columns in the same time and remove duplicates

Example output

data.frame(id = c(1,3,2,2), name = c("Google","Google","Amazon","Google"))

What I tried

dframe[!duplicated(dframe ["id", "name"]), ] 
Erik Bodg
  • 302
  • 2
  • 10

1 Answers1

1

The distinct function from dplyr might be what you are looking for:

dframe %>%
    distinct(id, name)

  id   name
1  1 Google
2  2 Google
3  3 Google
4  2 Amazon
Alex
  • 474
  • 4
  • 12
  • if the dataframe has more than these two columns and I want to check them it doesn't work – Erik Bodg May 09 '21 at 10:11
  • So you would want to specify all columns inside `distinct` that are relevant for you definition of duplicates. If this includes all columns, you can just write `distinct()` – Alex May 09 '21 at 10:25
  • No it is just you saw but in my real dataset it keep only the column which are inside the distinct. Any idea why this happens? – Erik Bodg May 09 '21 at 14:11
  • Ah you want to set `.keep_all = TRUE` then to keep other columns – Alex May 09 '21 at 14:14
  • distinct(id, name, .keep_all = TRUE) is this form? – Erik Bodg May 09 '21 at 19:13