How can I vectorize a for loop used for permutation?

Question

I am using R for analysis and would like to perform a permutation test. For this I am using a for loop that is quite slow and I would like to make the code as fast as possible. I think that vectorization is key for this. However, after several days of trying I still haven't found a suitable solution how to re-code this. I would deeply appreciate your help!

I have a symmetrical matrix with pairwise ecological distances between populations ("dist.mat"). I want to randomly shuffle the rows and columns of this distance matrix to generate a permuted distance matrix ("dist.mat.mix"). Then, I would like to save the upper triangular values in this permuted distance matrix (of the size of "nr.pairs"). This process should be repeated several times ("nr.runs"). The result should be a matrix ("result") containing the permuted upper triangular values of the several runs, with the dimensions of nrow=nr.runs and ncol=nr.pairs. Below an example R code that is doing what I want using a for loop:

# example number of populations
nr.pops <- 20

# example distance matrix
dist.mat <- as.matrix(dist(matrix(rnorm(20), nr.pops, 5)))

# example number of runs
nr.runs <- 1000

# find number of unique pairwise distances in distance matrix
nr.pairs <- nr.pops*(nr.pops-1) / 2

# start loop
result <- matrix(NA, nr.runs, nr.pairs)
for (i in 1:nr.runs) {
  mix <- sample(nr.pops, replace=FALSE)
  dist.mat.mix <- dist.mat[mix, mix]
  result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}

# inspect result
result

I already made some clumsy vectorization attempts with the base::replicate function, but this doesn't speed things up. Actually it's a bit slower:

# my for loop approach
my.for.loop <- function() {
  result <- matrix(NA, nr.runs, nr.pairs)
  for (i in 1:nr.runs){
    mix <- sample(nr.pops, replace=FALSE)
    dist.mat.mix <- dist.mat[mix ,mix]
    result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
  }
}

# my replicate approach
my.replicate <- function() {
  results <- t(replicate(nr.runs, {
    mix <- sample(nr.pops, replace=FALSE)
    dist.mat.mix <- dist.mat[mix, mix]
    dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
  }))
}

# compare speed
require(microbenchmark)
microbenchmark(my.for.loop(), my.replicate(), times=100L)

# Unit: milliseconds
# expr           min     lq      mean    median  uq      max       neval
# my.for.loop()  23.1792 24.4759 27.1274 25.5134 29.0666 61.5616   100
# my.replicate() 25.5293 27.4649 30.3495 30.2533 31.4267 68.6930   100

I would deeply appreciate your support in case you know how to speed up my for loop using a neat vectorized solution. Is this even possible?

How large is your real data? and how many `nr.runs` you want to run? — minem, Oct 28 '19 at 07:26
`nr.pops` is 20 and `nr.runs` is 1000. I updated this in the question. — HaRa, Oct 28 '19 at 09:11
where exactly this is slow? what are you doing that this becomes slow? one run of `my.for.loop()` is 0.01 seconds. Are you calling this multiple times? how many? calling this function 1000 times takes +/- 20 seconds. To me that doesn't seem to be slow. — minem, Oct 28 '19 at 09:50
This `for` loop is part of a larger calculation that repeats itself hundreds of millions of times. This larger calculation runs for several weeks. Reducing computational time of the `for` loop in question (even when it's just a few microseconds) can save me several hours or even days of run time. I found the `for` loop be the largest bottleneck in my larger calculation, therefore I want to make it as fast as possible. — HaRa, Oct 28 '19 at 10:24
`replicate` is not vectorization, its basically the same as `sapply`/`lapply`. And those two are basically 'for` loops. https://stackoverflow.com/questions/42393658/lapply-vs-for-loop-performance-r — minem, Oct 28 '19 at 11:29

minem · Accepted Answer · 2019-10-28T11:25:34.887

Slightly faster:

minem <- function() {
  result <- matrix(NA, nr.runs, nr.pairs)
  ut <- upper.tri(matrix(NA, 4, 4)) # create upper triangular index matrix outside loop
  for (i in 1:nr.runs) {
    mix <- sample.int(nr.pops) # slightly faster sampling function
    result[i, ] <- dist.mat[mix, mix][ut]
  }
  result
}
microbenchmark(my.for.loop(), my.replicate(), minem(), times = 100L)
# Unit: microseconds
# expr               min      lq      mean   median       uq      max neval cld
# my.for.loop()   75.062  78.222  96.25288  80.1975 104.6915  249.284   100   a
# my.replicate() 118.519 122.667 152.25681 126.0250 165.1355  495.407   100   a
# minem()         45.432  48.000 104.23702  49.5800  52.9380 4848.986   100   a

Update: We can get the necessary matrix indexes a little bit differently, so we can subset the elements at once:

minem4 <- function() {
  n <- dim(dist.mat)[1]
  ut <- upper.tri(matrix(NA, n, n))
  im <- matrix(1:n, n, n)
  p1 <- im[ut]
  p2 <- t(im)[ut]
  dm <- unlist(dist.mat)

  si <- replicate(nr.runs, sample.int(nr.pops))
  p <- (si[p1, ] - 1L) * n + si[p2, ]
  result2 <- matrix(dm[p], nr.runs, nr.pairs, byrow = T)
  result2
}

microbenchmark(my.for.loop(), minem(), minem4(), times = 100L)
# Unit: milliseconds
# expr                min        lq     mean    median        uq       max neval cld
# my.for.loop() 13.797526 14.977970 19.14794 17.071401 23.161867  29.98952   100   b
# minem()        8.366614  9.080490 11.82558  9.701725 15.748537  24.44325   100  a 
# minem4()       7.716343  8.169477 11.91422  8.723947  9.997626 208.90895   100  a

Update2: Some additional speedup we can get using dqrng sample function:

minem5 <- function() {
  n <- dim(dist.mat)[1]
  ut <- upper.tri(matrix(NA, n, n))
  im <- matrix(1:n, n, n)
  p1 <- im[ut]
  p2 <- t(im)[ut]
  dm <- unlist(dist.mat)

  require(dqrng)
  si <- replicate(nr.runs, dqsample.int(nr.pops))
  p <- (si[p1, ] - 1L) * n + si[p2, ]
  result2 <- matrix(dm[p], nr.runs, nr.pairs, byrow = T)
  result2
}

microbenchmark(my.for.loop(), minem(), minem4(), minem5(), times = 100L)
# Unit: milliseconds
# expr                min        lq      mean    median        uq      max neval  cld
# my.for.loop() 13.648983 14.672587 17.713467 15.265771 16.967894 36.18290   100    d
# minem()        8.282466  8.773725 10.679960  9.279602 10.335206 27.03683   100   c 
# minem4()       7.719503  8.208984  9.039870  8.493231  9.097873 25.32463   100  b  
# minem5()       6.134911  6.379850  7.226348  6.733035  7.195849 19.02458   100 a

Thank you very much, this is exactly what I was looking for! — HaRa, Oct 28 '19 at 16:20
Could you please elaborate a bit on the logic behind your matrix indexes? What are `p1` and `p2` doing and how do you derive at `p` and from there to the final matrix `results2`? — HaRa, Oct 28 '19 at 16:23

How can I vectorize a for loop used for permutation?

1 Answers1