1

I'm new to R and having trouble vectorizing a nested loop that is particularly slow. The loop goes through a list of centers (vectors stored in a structure) and finds the distance between these vectors and the rows of an array called x below. I know this needs to be vectorized for speed, but cannot figure out the appropriate functions to or use of apply to do so.

clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)

features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))

for(c in 1:dim(clusterCenters)[1]){
  center <- clusterCenters[c,]
  for(v in 1:(dim(clusterMembers)[1])){
    vector <- clusterMembers[v,]
    features[v,c] <- sqrt(sum((center - vector)^2))
  }
}

Thanks for any help.

Sevenless
  • 2,805
  • 4
  • 28
  • 47
  • 2
    Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Had you done that, you would have noticed the syntax error in your code. – Joshua Ulrich Mar 04 '13 at 17:33
  • 1
    (-1) I'd be happy to up-vote if you provide a reproducible example, that is expected of one, especially after 35 questions. – Arun Mar 04 '13 at 17:39
  • @Arun, my apologies. I know better and was sloppy. I've generated a reproducible example that illustrates the issue, although the dimensions involved in the real problem are much larger. – Sevenless Mar 04 '13 at 18:00

1 Answers1

2

You can take advantage of R's recycling rules to make this a bit faster. But you have to know and account for the fact that R stores matrices in column-major order. You do that by transposing clusterMembers and then the center vector will be recycled along the columns of t(clusterMembers).

set.seed(21)
clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)
# your original code in function form
seven <- function() {
  features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))
  for(c in 1:dim(clusterCenters)[1]){
    center <- clusterCenters[c,]
    for(v in 1:(dim(clusterMembers)[1])){
      vector <- clusterMembers[v,]
      features[v,c] <- sqrt(sum((center - vector)^2))
    }
  }
  features
}
# my fancy function
josh <- function() {
  tcm <- t(clusterMembers)
  Features <- matrix(0,ncol(tcm),nrow(clusterCenters))
  for(i in 1:nrow(clusterCenters)) {
    # clusterCenters[i,] returns a vector because drop=TRUE by default
    Features[,i] <- colSums((clusterCenters[i,]-tcm)^2)
  }
  Features <- sqrt(Features)  # outside the loop to avoid function calls
}
system.time(seven())
#    user  system elapsed 
#     2.7     0.0     2.7 
system.time(josh())
#    user  system elapsed 
#    0.28    0.11    0.39 
identical(seven(),josh())
# [1] TRUE
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418