Remove duplicates across columns

Question

df
     [,1] [,2]
 [1,] "a"  "b" 
 [2,] "a"  "c"
 [3,] "a"  "d"
 [4,] "b"  "a"
 [5,] "b"  "c"
 [6,] "b"  "d" 
 [7,] "c"  "a" 
 [8,] "c"  "b" 
 [9,] "c"  "d"

Let's assume have a data.frame like this and I want to remove duplicates in sense of across the column

df1
     [,1] [,2]
 [1,] "a"  "b" 
 [2,] "a"  "c"
 [3,] "a"  "d"
 [5,] "b"  "c"
 [6,] "b"  "d" 
 [9,] "c"  "d"

I want to end up like this.

Take a look at https://stackoverflow.com/questions/9028369/removing-duplicate-combinations-irrespective-of-order and the many linked questions in the side bar of that question for variations on this same question. — thelatemail, Sep 09 '20 at 04:50

akrun · Accepted Answer · 2020-09-09T04:45:48.813

3

We can sort the elements in each row with apply, transpose the output, apply duplicated to return a logical vector and use that for subsetting the rows

df[!duplicated(t(apply(df[, 1:2], 1, sort))),]
#     [,1] [,2]
#[1,] "a"  "b" 
#[2,] "a"  "c" 
#[3,] "a"  "d" 
#[4,] "b"  "c" 
#[5,] "b"  "d" 
#[6,] "c"  "d"

or another option is pmin/pmax

df[!duplicated(cbind(pmin(df[,1], df[,2]), pmax(df[,1], df[,2]))),]

data

df <- structure(c("a", "a", "a", "b", "b", "b", "c", "c", "c", "b", 
"c", "d", "a", "c", "d", "a", "b", "d"), .Dim = c(9L, 2L))

edited Sep 09 '20 at 04:45

answered Sep 09 '20 at 04:39

akrun

874,273
37
540
662

how to alter if there is 3rd column in df but we are only interested in duplication based on column 1 and 2 ? – iHermes Sep 09 '20 at 04:45
1

@iHermes you can just subset the first two columns. updated – akrun Sep 09 '20 at 04:46

Remove duplicates across columns

1 Answers1

data