How to completely remove duplicate rows from data?

Question

I will borrow an example for this question of the same name: Remove duplicated rows using dplyr

set.seed(123)
df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10)

> df
   x y  z
1  0 1  1
2  1 0  2
3  0 1  3
4  1 1  4
5  1 0  5
6  0 1  6
7  1 0  7
8  1 0  8
9  1 0  9
10 0 1 10

df[!duplicated(df[,1:2]),]
  x y z
1 0 1 1
2 1 0 2
4 1 1 4

The problem with the example is that it keeps one row and removes other duplicate rows. I need to completely remove all duplicate rows.

The final result should only have one row that is unique:

  x y z
4 1 1 4

Already answered here: How can I remove all duplicates so that NONE are left in a data frame?

Thank you Jaap. I promise I searched, a lot, and all results I found were like the example I posted.

the original turns out was a different question about data sets, just poorly named. However I still did not know the remove duplicates answer so this question. — MichaelE, Apr 14 '20 at 20:56

akrun · Accepted Answer · 2020-04-14T20:58:06.823

0

We can use the duplicated with fromLast as well. The duplicated alone will give TRUE for rows that are duplicated from the second elements i.e. if the values are 1, 2, 1, 2 3, duplicated gives FALSE, FALSE, TRUE, TRUE, FALSE. In order to get both first and second as TRUE, we need to apply it in reverse (fromLast = TRUE) and then wrap with | gives either one of them

df[!(duplicated(df[,1:2])|duplicated(df[1:2], fromLast = TRUE)),]
#  x y z
#4 1 1 4

Or another option would be

library(dplyr)
df %>%
     group_by(x, y) %>%
     filter(n() == 1)

data

df <- structure(list(x = c(0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L), 
    y = c(1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L), z = 1:10), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

edited Apr 14 '20 at 20:58

answered Apr 14 '20 at 20:47

akrun

874,273
37
540
662

that works, but I don't understand how. can you point me to a tutorial on this syntax? the | specifically? thanks. – MichaelE Apr 14 '20 at 20:55
nevermind the link Jaap posted explains it and | is just an or as expected. – MichaelE Apr 14 '20 at 21:14

How to completely remove duplicate rows from data?

1 Answers1

data