1

I will borrow an example for this question of the same name: Remove duplicated rows using dplyr

set.seed(123)
df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10)

> df
   x y  z
1  0 1  1
2  1 0  2
3  0 1  3
4  1 1  4
5  1 0  5
6  0 1  6
7  1 0  7
8  1 0  8
9  1 0  9
10 0 1 10

df[!duplicated(df[,1:2]),]
  x y z
1 0 1 1
2 1 0 2
4 1 1 4

The problem with the example is that it keeps one row and removes other duplicate rows. I need to completely remove all duplicate rows.

The final result should only have one row that is unique:

  x y z
4 1 1 4

Already answered here: How can I remove all duplicates so that NONE are left in a data frame?

Thank you Jaap. I promise I searched, a lot, and all results I found were like the example I posted.

MichaelE
  • 763
  • 8
  • 22
  • 1
    the original turns out was a different question about data sets, just poorly named. However I still did not know the remove duplicates answer so this question. – MichaelE Apr 14 '20 at 20:56

1 Answers1

0

We can use the duplicated with fromLast as well. The duplicated alone will give TRUE for rows that are duplicated from the second elements i.e. if the values are 1, 2, 1, 2 3, duplicated gives FALSE, FALSE, TRUE, TRUE, FALSE. In order to get both first and second as TRUE, we need to apply it in reverse (fromLast = TRUE) and then wrap with | gives either one of them

df[!(duplicated(df[,1:2])|duplicated(df[1:2], fromLast = TRUE)),]
#  x y z
#4 1 1 4

Or another option would be

library(dplyr)
df %>%
     group_by(x, y) %>%
     filter(n() == 1)

data

df <- structure(list(x = c(0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L), 
    y = c(1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L), z = 1:10), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
akrun
  • 874,273
  • 37
  • 540
  • 662