Delete duplicated rows with same values but in different column in R

Question

I want to remove the rows that are duplicated, for example if A==B and B==A, I want to keep just one of it. I have a dataframe looks like this:

|A      |B      |
|-------|-------|
|A1CF   |APOBEC1|
|A1CF   |KHSRP  |
|A1CF   |SYNCRIP|
|APOBEC1|A1CF   |
|SYNCRIP|A1CF   |

and my expected output is like this:

|A      |B      |
|-------|-------|
|A1CF   |APOBEC1|
|A1CF   |KHSRP  |
|A1CF   |SYNCRIP|

I have tried these,but it doesn't work.

df[!duplicated(df[,c("A","B")]),]

Tim Biegeleisen · Accepted Answer · 2021-04-25T07:45:42.817

1

One option would be to use a least/greatest trick, and then remove duplicates:

library(SparkR)

df <- unique(cbind(least(df$A, df$B), greatest(df$A, df$B)))

Here is a base R version of the above:

df <- unique(cbind(ifelse(df$A < df$B, df$A, df$B),
                   ifelse(df$A >= df$B, df$A, df$B)))

edited Apr 25 '21 at 07:45

answered Apr 25 '21 at 07:31

Tim Biegeleisen

Hi, thanks for replying. I tried this but SparkR is not available for Rstudio 4.0.5. – Jenny Empawi Apr 25 '21 at 07:43
@JennyEmpawi I updated with a base R version which should run everywhere. Slightly more verbose though. – Tim Biegeleisen Apr 25 '21 at 07:46

1 Answers1