3

I have a problem to find a vectorization representation for a specific loop in R. My objective is to enhance the performance of the loop, because it has to be run thousands of times in the algorithm.

I want to find the position of the lowest value in a particular array section defined by a vector 'Level' for each row.

Example:

Level = c(2,3)

Let first row of array X be: c(2, -1, 3, 0.5, 4).

Searching for the position of the lowest value in the range 1:Level[1] of the row (that is (2, -1)), I get a 2, because -1 < 2 and -1 stands on second position of the row. Then, searching the position of the lowest value in the second range (Level[1]+1):(Level[1]+Level[2]) (that is (3, 0.5, 4)), I get a 4, because 0.5 < 3 < 4 and 0.5 stands on fourth position of the row.

I have to perform this over each row in the array.

My solution to the problem works as follows:

Level = c(2,3,3)  #elements per section, here: 3 sections with 2,3 and 3 levels
rows = 10  #number of rows in array X
X = matrix(runif(rows*sum(Level),-5,5),rows,sum(Level))  #array with 10 rows and sum(Level) columns, here: 8
Position_min = matrix(0,rows,length(Level))  #array in which the position of minimum values for each section and row are stored
for(i in 1:rows){
 for(j in 1:length(Level)){            #length(Level) is number of intervals, here: 3
  if(j == 1){coeff=0}else{coeff=1}
  Position_min[i,j] = coeff*sum(Level[1:(j-1)]) + which(X[i,(coeff*sum(Level[1:(j-1)])+1):sum(Level[1:j])] == min(X[i,(coeff*sum(Level[1:(j-1)])+1):sum(Level[1:j])]))
  }
}

It works fine but I would prefer a solution with better performance. Any ideas?

Khashaa
  • 7,293
  • 2
  • 21
  • 37
Stromberg
  • 105
  • 4

2 Answers2

3

This will remove the outer level of the loop:

Level1=c(0,cumsum(Level))
for(j in 1:(length(Level1)-1)){
    Position_min[,j]=max.col(-X[,(Level1[j]+1):Level1[j+1]])+(Level1[j])
}
Joswin K J
  • 690
  • 1
  • 7
  • 16
  • I just wanted to post the same changings to your code you edited. Thumbs up, it works! Little speed test showed that this solution is about 33 times faster than my proposed code! Thank you very much for your solution! – Stromberg Jul 13 '15 at 12:10
3

Here is a "fully vectorized" solution with no explicit loops:

findmins <- function(x, level) {
    series <- rep(1:length(Level), Level)
    x <- split(x, factor(series))
    minsSplit <- as.numeric(sapply(x, which.min))
    minsSplit + c(0, cumsum(level[-length(level)]))
}

Position_min_vectorized <- t(apply(X, 1, findmins, Level))
identical(Position_min, Position_min_vectorized)
## [1] TRUE

You can get better performance by making your matrix into a list, and then using parallel's mclapply():

X_list <- split(X, factor(1:nrow(X)))
do.call(rbind, parallel::mclapply(X_list, findmins, Level))
##    [,1] [,2] [,3]
## 1     1    5    6
## 2     2    3    6
## 3     1    4    7
## 4     1    5    6
## 5     2    5    7
## 6     2    4    6
## 7     1    5    8
## 8     1    5    8
## 9     1    3    8
## 10    1    3    8
Ken Benoit
  • 14,454
  • 27
  • 50
  • 3
    fully vectorized with full of `apply` and `sapply`? – ExperimenteR Jul 13 '15 at 12:33
  • 1
    Thank you for your solution to the problem! It seems that the solution of @user3169080 is faster even if I apply the parallelization. – Stromberg Jul 13 '15 at 12:48
  • Touché, @ExperimenteR, *apply is just a loop wrapper. But over-vectorisation is also a deadly sin: burns-stat.com/pages/Tutor/R_inferno.pdf p24. user3169080's solution is MUCH faster, but I'm glad the parallelization improved it further. Note that `mclapply()` won't actually parallelize on Windows (but there are other ways to parallelize on that platform). – Ken Benoit Jul 13 '15 at 13:07