0

Starting with this data.frame:

set.seed(123)
df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10)
> df
   x y  z
1  0 1  1
2  1 0  2
3  0 2  3
4  2 1  4
5  1 3  5
6  0 1  6
7  1 0  7
8  1 0  8
9  1 0  9
10 0 1 10

I would like to remove ALL rows with duplicates based on the first two columns. Using distinct from dplyr always keeps the first row. I'm looking for a method that throws out all the rows that had duplicates.

Expected output:

  x y z
3 0 2 3
4 2 1 4
5 1 3 5
fn2197
  • 28
  • 4

2 Answers2

1
library(tidyverse)

df %>%
  group_by(x, y) %>%
  filter(n() == 1)
Abigail
  • 370
  • 1
  • 11
0

In base R you can do

df[!(duplicated(df[1:2]) | duplicated(df[1:2], fromLast = TRUE)),]
#>   x y z
#> 3 0 2 3
#> 4 2 1 4
#> 5 1 3 5
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87