Remove all duplicated rows

Question

Starting with this data.frame:

set.seed(123)
df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10)
> df
   x y  z
1  0 1  1
2  1 0  2
3  0 2  3
4  2 1  4
5  1 3  5
6  0 1  6
7  1 0  7
8  1 0  8
9  1 0  9
10 0 1 10

I would like to remove ALL rows with duplicates based on the first two columns. Using distinct from dplyr always keeps the first row. I'm looking for a method that throws out all the rows that had duplicates.

Expected output:

score 1 · Accepted Answer · answered Sep 14 '22 at 19:59

1

library(tidyverse)

df %>%
  group_by(x, y) %>%
  filter(n() == 1)

answered Sep 14 '22 at 19:59

Abigail

370
1
11

score 0 · Answer 2 · answered Sep 14 '22 at 20:02

0

In base R you can do

df[!(duplicated(df[1:2]) | duplicated(df[1:2], fromLast = TRUE)),]
#>   x y z
#> 3 0 2 3
#> 4 2 1 4
#> 5 1 3 5

answered Sep 14 '22 at 20:02

Allan Cameron

147,086
7
49
87

Remove all duplicated rows

2 Answers2