remove duplicate rows regardless of order

Question

I have a data.frame like so:

df <- structure(list(X1 = c("PF00041", "PF00041", "PF00041", "PF00041", 
"PF00041", "PF00041", "PF00041", "PF00041", "PF00041", "PF00047", 
"PF00041", "PF00041", "PF00041", "PF00054", "PF00054", "PF02210", 
"PF07679", "PF07714", "PF07714", "PF07714", "PF07714", "PF07714", 
"PF07714", "PF00041", "PF00041", "PF00041"), X2 = c("PF00041", 
"PF00041", "PF00041", "PF00041", "PF00041", "PF00041", "PF07679", 
"PF07679", "PF07679", "PF13895", "PF00047", "PF00047", "PF00047", 
"PF02210", "PF13895", "PF07645", "PF13895", "PF07714", "PF07714", 
"PF07714", "PF07714", "PF07714", "PF07714", "PF13895", "PF13895", 
"PF13895"), pfam_name.x = c("fn3", "fn3", "fn3", "fn3", "fn3", 
"fn3", "fn3", "fn3", "fn3", "ig", "fn3", "fn3", "fn3", "Laminin_G_1", 
"Laminin_G_1", "Laminin_G_2", "I-set", "Pkinase_Tyr", "Pkinase_Tyr", 
"Pkinase_Tyr", "Pkinase_Tyr", "Pkinase_Tyr", "Pkinase_Tyr", "fn3", 
"fn3", "fn3"), pfam_name.y = c("fn3", "fn3", "fn3", "fn3", "fn3", 
"fn3", "I-set", "I-set", "I-set", "Ig_2", "ig", "ig", "ig", "Laminin_G_2", 
"Ig_2", "EGF_CA", "Ig_2", "Pkinase_Tyr", "Pkinase_Tyr", "Pkinase_Tyr", 
"Pkinase_Tyr", "Pkinase_Tyr", "Pkinase_Tyr", "Ig_2", "Ig_2", 
"Ig_2"), value.x = c("5", "5", "13", "13", "17", "17", "5", "13", 
"17", "18", "5", "13", "17", "11", "11", "12", "14", "6", "6", 
"15", "15", "20", "20", "5", "13", "17"), value.y = c("13", "17", 
"5", "17", "5", "13", "14", "14", "14", "19", "18", "18", "18", 
"12", "19", "8", "19", "15", "20", "6", "20", "6", "15", "19", 
"19", "19")), row.names = c(2L, 3L, 4L, 6L, 7L, 8L, 10L, 11L, 
12L, 13L, 15L, 16L, 17L, 19L, 20L, 25L, 27L, 29L, 30L, 31L, 33L, 
34L, 35L, 38L, 39L, 40L), class = "data.frame")

I'd like to be able to filter this data.frame based on columns value.x and value.y but I don't want to keep the rows that are the switched. for example row 1 has values 5 and 13 respectively and row 3 has values 13 and 5 respectively, I want to get rid of row 3.

I tried sorting at first, but because I have other columns the sort is mixing up all the columns together. For example:

data.frame(unique(t(apply(df, 1, sort))), stringsAsFactors = F)

In this table I could see that Pkinase_Tyr was now in column X1.

It's the other columns that make this question different than that one. I can't simply sort because the whole table gets messed up that way — Beeba, Jun 24 '19 at 18:20
Instead of applying over `df`, apply over `df[vector_of_relevant_columns]` — IceCreamToucan, Jun 24 '19 at 18:22
Also, you may want to look at the answer which uses `pmax` and `pmin`, since it names the two columns explicitly and therefore would work the same even if `df` has more columns. — IceCreamToucan, Jun 24 '19 at 18:24

remove duplicate rows regardless of order

0 Answers0