0

I'm working on something which requires me to calculate the elements of a large square matrix repeatedly. The process involves reading data stored in another matrix, and then calculating the matrix elements. Presently I am using a double for loop to do this.

library(matrixcalc)

data <- matrix(nrow=3,ncol=1000)

for(x in 1:ncol(data)){
   for(y in 1:ncol(data)){
       matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
   }
}

The problem is that this is extremely slow since my matrix is very large. What is the fastest alternative to this procedure?

Comp_Warrior
  • 169
  • 8
  • 3
    Please give a [reproducible example](http://stackoverflow.com/a/5963610/1412059) of what you are doing now with the `for` loops. Answers will depend on the calculations you do inside the loops. – Roland Jul 04 '13 at 10:05
  • 2
    It's still not reproducible because we don't have `entrywise.norm` nor `data`. – Roland Jul 04 '13 at 10:21
  • @Roland *entrywise.norm* is a function in the matrixcalc package, *data* is a 3x2500 matrix – Comp_Warrior Jul 04 '13 at 10:26
  • 4
    @Comp_Warrior why should *we* do extra work to answer *your* problem? Please read [**how to make a great reproducible example**](http://stackoverflow.com/q/5963269/1478381). A simple toy example will do. How hard is it to add a couple of lines of code... e.g. `require(matrixcalc) ; data <- matrix( 1 , 3 , 2500 )`. – Simon O'Hanlon Jul 04 '13 at 10:38
  • 1
    @Comp_Warrior or `data <- matrix( 1 , 2500 ,3)`..... – agstudy Jul 04 '13 at 10:40
  • @SimonO101 I have implemented the suggestions, but will only accept the answer when I am able to see if they are quicker with my data. (later) – Comp_Warrior Jul 04 '13 at 12:48
  • @Comp_Warrior no you haven't. Check your code. You have a single column matrix filled with 3's. If all you do is copy/paste from the ***example*** I gave you (i.e. intended as a jumping off point not an answer) all you will do is fill a 3 row, 2500 column matrix with 1's. Totally meaningless for this problem. Do a modicum of work on your own and make a *SMALL REPRODUCIBLE EXAMPLE*. – Simon O'Hanlon Jul 04 '13 at 12:54

4 Answers4

3

Short and very fast:

mat <- exp(-as.matrix(dist(t(data))))

I would also recommend the fields::rdist function as a faster alternative to dist for computing the matrix of euclidean distances, so if loading the package is not an issue, consider:

library(fields)
mat <- exp(-rdist(t(data)))

To give you an idea of the speed improvement:

data <- matrix(runif(3000), nrow=3, ncol=1000)

OP <- function(data) {
  require(matrixcalc)
  mat <- matrix(0, ncol(data), ncol(data))
  for(x in 1:ncol(data)){
    for(y in 1:ncol(data)){
      mat[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
    }
  }
  mat
}

flodel1 <- function(data) exp(-as.matrix(dist(t(data))))
flodel2 <- function(data) {
  require(fields)
  exp(-rdist(t(data)))
}

system.time(res1 <- OP(data))
#   user  system elapsed 
# 22.708   2.080  24.602 
system.time(res2 <- flodel1(data))
#   user  system elapsed 
#  0.112   0.025   0.136 
system.time(res3 <- flodel2(data))
#   user  system elapsed 
#  0.048   0.000   0.049 

(Note that in the case of OP and flodel2, these runtimes do not include the loading of the packages as they had been loaded prior to the tests.)

flodel
  • 87,577
  • 21
  • 185
  • 223
  • +1 Wow. I don't think you should have removed your comment explaining why you downvoted the OP, because it is (still) very relevant. – Simon O'Hanlon Jul 04 '13 at 11:58
2

This should be considerably faster:

nc <- ncol(data)

mat <- diag(nc)

for(x in 2:nc){
   for(y in 1:x){
       mat[x, y] <- exp(-(sum((data[ , x] - data[ , y])^2) ^ .5))
   }
}

mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
1

R language uses column-major-order arrays. Changing the for loops order can give a boost in performance. Because this way, you access the memory in a more contiguous shape, enabling cpu-cache advantages.

 for(y in 1:dim) //outer is y now
 {
    for(x in 1:dim) //now x is count inside
    {
        matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
    }
 }

Your "matrix" is 2D-array right?

If you need even more speed, you can unroll some of inner loop to have less branching load for cpu and better caching/prefetching.

 for(y in 1:dim) 
 {
    for(x in 1:(dim/8)) //lets imagine dimension is a multiple of 8
    {
        matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
        matrix[x+1,y]=exp(-entrywise.norm(data[,x+1]-data[,y],2))
        matrix[x+2,y]=exp(-entrywise.norm(data[,x+2]-data[,y],2))
        matrix[x+3,y]=exp(-entrywise.norm(data[,x+3]-data[,y],2))
        matrix[x+4,y]=exp(-entrywise.norm(data[,x+4]-data[,y],2))
        matrix[x+5,y]=exp(-entrywise.norm(data[,x+5]-data[,y],2))
        matrix[x+6,y]=exp(-entrywise.norm(data[,x+6]-data[,y],2))
        matrix[x+7,y]=exp(-entrywise.norm(data[,x+7]-data[,y],2))
    }
 }
huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97
1

You can use colSums instead of the inner loop. Based on the answer by @Sven Hohenstein:

nc <- ncol(data)

mat <- diag(nc)

for(x in 2:nc){
  mat[x, 1:x] <- exp(-(colSums((data[ , 1:x] - data[ ,x])^2) ^ .5))
}

mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
Roland
  • 127,288
  • 10
  • 191
  • 288