8

Let's say I have a matrix x which contains 10 rows and 2 columns. I want to generate a new matrix M that contains each unique pair of rows from x - that is, a new matrix with 55 rows and 4 columns.

E.g.,

x <- matrix (nrow=10, ncol=2, 1:20)

M <- data.frame(matrix(ncol=4, nrow=55))
k <- 1
for (i in 1:nrow(x))
for (j in i:nrow(x))
{
    M[k,] <- unlist(cbind (x[i,], x[j,]))
    k <- k + 1
}

So, x is:

      [,1] [,2]
 [1,]    1   11
 [2,]    2   12
 [3,]    3   13
 [4,]    4   14
 [5,]    5   15
 [6,]    6   16
 [7,]    7   17
 [8,]    8   18
 [9,]    9   19
[10,]   10   20

And then M has 4 columns, the first two are one row from x and the next 2 are another row from x:

> head(M,10)
   X1 X2 X3 X4
1   1 11  1 11
2   1 11  2 12
3   1 11  3 13
4   1 11  4 14
5   1 11  5 15
6   1 11  6 16
7   1 11  7 17
8   1 11  8 18
9   1 11  9 19
10  1 11 10 20

Is there either a faster or simpler (or both) way of doing this in R?

Josh Reich
  • 6,477
  • 5
  • 28
  • 26

5 Answers5

9

The expand.grid() function useful for this:

R> GG <- expand.grid(1:10,1:10)
R> GG <- GG[GG[,1]>=GG[,2],]     # trim it to your 55 pairs
R> dim(GG)
[1] 55  2
R> head(GG)
  Var1 Var2
1    1    1
2    2    1
3    3    1
4    4    1
5    5    1
6    6    1
R> 

Now you have the 'n*(n+1)/2' subsets and you can simple index your original matrix.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
3

I'm not quite grokking what you are doing so I'll just throw out something that may, or may not help.

Here's what I think of as the Cartesian product of the two columns:

expand.grid(x[,1],x[,2])
JD Long
  • 59,675
  • 58
  • 202
  • 294
2

You can also try the "relations" package. Here is the vignette. It should work like this:

relation_table(x %><% x)
Shane
  • 98,550
  • 35
  • 224
  • 217
1

Using Dirk's answer:

idx <- expand.grid(1:nrow(x), 1:nrow(x))
idx<-idx[idx[,1] >= idx[,2],]
N <- cbind(x[idx[,2],], x[idx[,1],])

> all(M == N)
[1] TRUE

Thanks everyone!

Josh Reich
  • 6,477
  • 5
  • 28
  • 26
1

Inspired from the other answers, here is a function implementing cartesian product of two matrices, in the case of two matrices, the full cartesian product, for only one argument, omitting one of each pair:

cartesian_prod <- function(M1, M2) {
if(missing(M2)) {  M2 <- M1
     ind  <- expand.grid(1:NROW(M1), 1:NROW(M2))
     ind <- ind[ind[,1] >= ind[,2],] } else {
                                          ind  <- expand.grid(1:NROW(M1), 1:NROW(M2))}
rbind(cbind(M1[ind[,1],], M2[ind[,2],]))

}

kjetil b halvorsen
  • 1,206
  • 2
  • 18
  • 28