2

If I have a df like this

data<-data.frame(id=c(1,1,3,4),n=c("x","y","e","w"))
data
  id n
1  1 x
2  1 y
3  3 e
4  4 w

I want to get a new df like this:

data
  id n
3  3 e
4  4 w

That is, I want it to remove all repeating rows. I've tried functions like distinct from dplyr but it always gets one of the repeating rows.

akrun
  • 874,273
  • 37
  • 540
  • 662
José Rojas
  • 313
  • 1
  • 8

4 Answers4

5

Another subset option with ave

subset(
    data,
    ave(n, id, FUN = length) == 1
)

gives

  id n
3  3 e
4  4 w
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
4

We may need duplicated

subset(data, !(duplicated(id)|duplicated(id, fromLast = TRUE)))
  id n
3  3 e
4  4 w

or use table

subset(data, id %in% names(which(table(id) == 1)))
  id n
3  3 e
4  4 w
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Although more verbose, you can also use base R.

data[!(duplicated(data["id"])|duplicated(data["id"], fromLast=TRUE)),]

Output

  id n
3  3 e
4  4 w

Or use dplyr.

library(dplyr)

data %>%
    dplyr::group_by(id) %>%
    dplyr::filter(n() == 1) %>%
    dplyr::ungroup()
akrun
  • 874,273
  • 37
  • 540
  • 662
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
1

Just adding to the already useful answers with a dplyr solution.

library(dplyr)

data %>% filter(
        !(duplicated(id,fromLast = FALSE) | duplicated(id,fromLast = TRUE) )
)

distinct won't work for you, as it will retain all distinct values based on, in your case, id in which 1 is always a part of.

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
Serkan
  • 1,855
  • 6
  • 20