Remove duplicate rows ignoring column order? in R

Question

Hi apologies if this has been asked before, I couldn't find an answer to my question but I've been unsure on the best way to word this - so that might have hindered my searches!

So my data was originally in a matrix like this:

	a	b	c
a	0	0.576	0.987
b	0.576	0	0.034
c	0.987	0.034	0

I converted this to long format using melt, and removed the zero values where a value would map to itself, so the data now looks like this:

var1	var2	value
a	b	0.576
b	a	0.576
a	c	0.987
c	a	0.987
b	c	0.034
c	b	0.034

What's the best way to remove the duplicate rows so that the data would look like:

var1	var2	value
a	b	0.576
a	c	0.987
b	c	0.034

Ottie · Accepted Answer · 2022-08-05T11:10:20.133

df <- data.frame(
    var1 = c("a", "b", "a", "c", "b", "c"), 
    var2 = c("b", "a", "c", "a", "c", "b"), 
    value = c(0.576, 0.576, 0.987, 0.987, 0.034, 0.034)
)

A one-liner base-r solution:

df_unique <- df[!duplicated(apply(df[,1:2], 1, function(row) paste(sort(row), collapse=""))),]

df_unique
  var1 var2 value
1    a    b 0.576
3    a    c 0.987
5    b    c 0.034

What it does: work across the first 2 columns row-wise (apply with MARGIN = 1), sort (alphabetically) the content, paste into a single string, remove all indices where the string has already occurred before (!duplicated).

Another (probably better) approach, stepping back, is to take your original matrix and clear out the bottom half using lower.tri. This way only half of the combinations will have non-0 values:

mat <- matrix(c(0, 0.576, 0.987, 0.576, 0, 0.034, 0.987, 0.034, 0), 
              nrow=3, dimnames=list(letters[1:3], letters[1:3]))

mat[lower.tri(mat, diag = TRUE)] <- NA
mat
   a     b     c
a NA 0.576 0.987
b NA    NA 0.034
c NA    NA    NA

Remove duplicate rows ignoring column order? in R

1 Answers1