Eliminate duplicates in R

Question

If I have a df like this

data<-data.frame(id=c(1,1,3,4),n=c("x","y","e","w"))
data
  id n
1  1 x
2  1 y
3  3 e
4  4 w

I want to get a new df like this:

data
  id n
3  3 e
4  4 w

That is, I want it to remove all repeating rows. I've tried functions like distinct from dplyr but it always gets one of the repeating rows.

score 5 · Accepted Answer · answered Jul 12 '21 at 21:13

5

Another subset option with ave

subset(
    data,
    ave(n, id, FUN = length) == 1
)

gives

  id n
3  3 e
4  4 w

answered Jul 12 '21 at 21:13

ThomasIsCoding

score 4 · Answer 2 · answered Jul 12 '21 at 21:08

4

We may need duplicated

subset(data, !(duplicated(id)|duplicated(id, fromLast = TRUE)))
  id n
3  3 e
4  4 w

or use table

subset(data, id %in% names(which(table(id) == 1)))
  id n
3  3 e
4  4 w

answered Jul 12 '21 at 21:08

akrun

score 1 · Answer 3 · edited Aug 13 '21 at 18:02

1

Although more verbose, you can also use base R.

data[!(duplicated(data["id"])|duplicated(data["id"], fromLast=TRUE)),]

Output

  id n
3  3 e
4  4 w

Or use dplyr.

library(dplyr)

data %>%
    dplyr::group_by(id) %>%
    dplyr::filter(n() == 1) %>%
    dplyr::ungroup()

edited Aug 13 '21 at 18:02

akrun

answered Jul 12 '21 at 21:30

AndrewGB

score 1 · Answer 4 · edited Jul 13 '21 at 06:53

1

Just adding to the already useful answers with a dplyr solution.

library(dplyr)

data %>% filter(
        !(duplicated(id,fromLast = FALSE) | duplicated(id,fromLast = TRUE) )
)

distinct won't work for you, as it will retain all distinct values based on, in your case, id in which 1 is always a part of.

edited Jul 13 '21 at 06:53

AndrewGB

answered Jul 12 '21 at 21:59

Serkan

4 Answers4