1

I want to remove the rows that are duplicated, for example if A==B and B==A, I want to keep just one of it. I have a dataframe looks like this:

|A      |B      |
|-------|-------|
|A1CF   |APOBEC1|
|A1CF   |KHSRP  |
|A1CF   |SYNCRIP|
|APOBEC1|A1CF   |
|SYNCRIP|A1CF   |

and my expected output is like this:

|A      |B      |
|-------|-------|
|A1CF   |APOBEC1|
|A1CF   |KHSRP  |
|A1CF   |SYNCRIP|

I have tried these,but it doesn't work.

df[!duplicated(df[,c("A","B")]),]

1 Answers1

1

One option would be to use a least/greatest trick, and then remove duplicates:

library(SparkR)

df <- unique(cbind(least(df$A, df$B), greatest(df$A, df$B)))

Here is a base R version of the above:

df <- unique(cbind(ifelse(df$A < df$B, df$A, df$B),
                   ifelse(df$A >= df$B, df$A, df$B)))
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360