R deleting duplicates in other columns

Question

Hey guys I definitely solved this problem before but I lost my code... Here is a simplification of what I have.

a1 <- c(1,2,4,3,5)
a2 <- c("a","b","b","c","f")
a3 <- c(3,4,"b",1,9)
a4 <- c("c","b",2,"a","d")
a <- cbind(a1,a2,a3,a4)

a1 and a2 are a set as well as a3 and a4:

I would like to remove the duplicates. So remove rows 3 and 4. This data comes from a blast showing links between genomes and it is 34,000 rows long so a efficient solution would be great.

Thank you so much! I would also be open to doing this in another language.

score 0 · Accepted Answer · answered Aug 31 '16 at 19:17

0

We can sort the 'a' by row, get the logical index of not (!) duplicated elements and use that to filter the rows.

i1 <- !duplicated(t(apply(a, 1, sort)))
a1 <- a[i1,]

The index of rows that remains in the dataset are

which(i1)
#[1] 1 2 5

answered Aug 31 '16 at 19:17

akrun

874,273
37
540
662

R deleting duplicates in other columns

1 Answers1