Starting with this data.frame:
set.seed(123)
df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10)
> df
x y z
1 0 1 1
2 1 0 2
3 0 2 3
4 2 1 4
5 1 3 5
6 0 1 6
7 1 0 7
8 1 0 8
9 1 0 9
10 0 1 10
I would like to remove ALL rows with duplicates based on the first two columns. Using distinct
from dplyr
always keeps the first row. I'm looking for a method that throws out all the rows that had duplicates.
Expected output:
x y z
3 0 2 3
4 2 1 4
5 1 3 5