0

It is a bit simple but I couldn't think of a solution. I have a data frame containing rows like this:

ColumnA   ColumnB  
protein1  protein2  
protein2  protein1

the rest being the same. So, I would like to keep only one of them as they are duplicates for my analysis. I have the a vector containing protein1 and protein2. I identified those columns based on that vectoe but it is total 100K lines. However, I just couldn't think of a way to selectively remove them. Does anybody have an idea?

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81

1 Answers1

0

You can try igraph like below

library(igraph)
unique(as_data_frame(graph_from_data_frame(df,directed = FALSE)))

which gives

      from       to
1 protein1 protein2

data

> dput(df)
structure(list(ColumnA = c("protein1", "protein2"), ColumnB = c("protein2", 
"protein1")), class = "data.frame", row.names = c(NA, -2L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81