Remove duplicates checking more than one rows dataframe

Question

Having a dataframe:

dframe <- data.frame(id = c(1,2,3,2,2), name = c("Google","Google","Google","Amazon","Google"))

How is it possible to check both columns in the same time and remove duplicates

Example output

data.frame(id = c(1,3,2,2), name = c("Google","Google","Amazon","Google"))

What I tried

dframe[!duplicated(dframe ["id", "name"]), ]

Try `duplicated(dframe)`. More exactly, `dframe[!duplicated(dframe), ]`. — Rui Barradas, May 09 '21 at 08:39

score 1 · Answer 1 · answered May 09 '21 at 08:39

1

The distinct function from dplyr might be what you are looking for:

dframe %>%
    distinct(id, name)

  id   name
1  1 Google
2  2 Google
3  3 Google
4  2 Amazon

answered May 09 '21 at 08:39

Alex

if the dataframe has more than these two columns and I want to check them it doesn't work – Erik Bodg May 09 '21 at 10:11
So you would want to specify all columns inside `distinct` that are relevant for you definition of duplicates. If this includes all columns, you can just write `distinct()` – Alex May 09 '21 at 10:25
No it is just you saw but in my real dataset it keep only the column which are inside the distinct. Any idea why this happens? – Erik Bodg May 09 '21 at 14:11
Ah you want to set `.keep_all = TRUE` then to keep other columns – Alex May 09 '21 at 14:14
distinct(id, name, .keep_all = TRUE) is this form? – Erik Bodg May 09 '21 at 19:13

1 Answers1