How to get only unique combinations of variables where entries can be in either variable

Question

Given that we have

j<-c("a","b","c","d")  
l<-expand.grid(j,j)

print(l)

Var1 Var2
1     a    a
2     b    a
3     c    a
4     d    a
5     a    b
6     b    b
7     c    b
8     d    b
9     a    c
10    b    c
11    c    c
12    d    c
13    a    d
14    b    d
15    c    d
16    d    d

I want to only return unique entries such as:

print(newl)
Var1 Var2
a    a
a    b
a    c
a    d
b    b
b    c
b    d
c    c
c    d
d    d

I have found a lot of answers where its unique combinations of variables, but where the variables do not cross over columns.

This all comes from doing corr.test {psych} and unrolling the corr.test$r into a single vector using as.vector(corr.test$r).

To get what correlations those are based off of I used

names<-expand.grid(rownames(corr.test$r),colnames(corr.test$r))

which ends up being consistent with the structure of the 'unrolled' r matrix from as.vector.

But it returns the whole matrix (both the upper and lower triangles). So I'm looking for a way to take only the unique correlations (half of the data.frame).

Have a look at @Ferdinands answer here http://stackoverflow.com/questions/17171148/non-redundant-version-of-expand-grid for the expand.grid soln. — user20650, May 30 '14 at 00:45

score 4 · Accepted Answer · answered May 30 '14 at 00:48

The combn function will give you all n-combinations of elements from a vector, however it does not match elements with themselves. You can add that result on fairly easily, Thus you can get the combinations you want with

cbind(combn(j,2), rbind(j,j))

#   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# j "a"  "a"  "a"  "b"  "b"  "c"  "a"  "b"  "c"  "d"  
# j "b"  "c"  "d"  "c"  "d"  "d"  "a"  "b"  "c"  "d"

score 1 · Answer 2 · answered May 30 '14 at 01:09

You can reshape the data to avoid this

library(psych)
library(reshape2)

# example data
dat <- mtcars[1:4]

# For all correlations
melt(corr.test(dat)$r)

# For unique correlations
out <- corr.test(dat)$r
out[upper.tri(out)] <- NA    

melt(out, na.rm=TRUE)

   Var1 Var2      value
#  1   mpg  mpg  1.0000000
#  2   cyl  mpg -0.8521620
#  3  disp  mpg -0.8475514
#  4    hp  mpg -0.7761684
#  6   cyl  cyl  1.0000000
#  7  disp  cyl  0.9020329
#  8    hp  cyl  0.8324475
#  11 disp disp  1.0000000
#  12   hp disp  0.7909486
#  16   hp   hp  1.0000000

score 0 · Answer 3 · answered May 30 '14 at 00:32

0

One thing you could do is putting the answers in an array using Var1 as the key and Var2 as the value and then adding the pairs into a temp array if the pair doesn't already exist in the temp array.

answered May 30 '14 at 00:32

Zino

11
4

score 0 · Answer 4 · answered May 30 '14 at 01:16

Thank you for your answers.

I ended up taking a shot and here is what I came up with:

j<-c("a","b","c","d")  
l<-expand.grid(j,j)


twist<-function(l){
l<-subset(l,l[,1]!=l[,2])
leng<-length(l[,1])/2
for (i in 1:leng) {
    g1<-l[,1]
    g2<-l[,2]
    g1[i]<-l[i,2]
    g2[i]<-l[i,1]
    l[,1]<-g1
    l[,2]<-g2
l<-unique(l[c("Var1", "Var2")])

}
return(l)
}
k<-twist(l)

print(k)

   Var1 Var2
2     a    b
3     a    c
4     a    d
7     b    c
8     b    d
12    c    d

I called it 'twist' for pretty obvious reasons. Feel free to critique it.

quick note you can do this with `t(combn(j,2))` – user20650 May 30 '14 at 01:18 — user20650, May 30 '14 at 01:18

How to get only unique combinations of variables where entries can be in either variable

4 Answers4