2

I'm hoping to vectorize the following loop:

for (i in 1:n) {
    for (j in 1:m) {
        temp_mat[i,j]=min(temp_mat[i,j],1);
    }
}  

I thought I could do temp_mat=min(temp_mat,1), but this is not giving me the desired result. Is there a way to vectorize this loop to make it much faster?

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
drjrm3
  • 4,474
  • 10
  • 53
  • 91

2 Answers2

10

Just use temp_mat <- pmin(temp_mat, 1). See ?pmin for more use of parallel minima.

Example:

set.seed(0); A <- matrix(sample(1:3, 25, replace = T), 5)
#> A
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    3    1    1    3    3
#[2,]    1    3    1    2    3
#[3,]    2    3    1    3    1
#[4,]    2    2    3    3    2
#[5,]    3    2    2    2    1
B <- pmin(A, 2)
#> B
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    2    1    1    2    2
#[2,]    1    2    1    2    2
#[3,]    2    2    1    2    1
#[4,]    2    2    2    2    2
#[5,]    2    2    2    2    1

update

Since you have background in computational science, I would like to provide more information.

pmin is fast, but is far from high performance. Its prefix "parallel" only suggests element-wise. The meaning of "vectorization" in R is not the same as "SIMD vectorization" in HPC. R is an interpreted language, so "vectorization" in R means opting for C level loop rather than R level loop. Therefore, pmin is just coded with a trivial C loop.

Real high performance computing should benefit from SIMD vectorization. I believe you know SSE/AVX intrinsics. So if you write a simple C code, using _mm_min_pd from SSE2, you will get ~2 times speedup from pmin; if you see _mm256_min_pd from AVX, you will get ~4 times speedup from pmin.

Unfortunately, R itself can not do any SIMD. I have an answer to a post at Does R leverage SIMD when doing vectorized calculations? regarding this issue. For your question, even if you link your R to a HPC BLAS, pmin will not benefit from SIMD, simply because pmin does not involve any BLAS operations. So a better bet is to write compiled code yourself.

Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • I don't think it has anything to do with R being column-major. `pmin` returns the element-wise minimum between the items passed, so the 1 is just being coerced into an object of the same dimensions. For example, if different size vectors are passed, you get a warning about elements being recycled. – Gabe May 19 '16 at 19:16
  • 2
    Although it was not what the OP requested, there are a number of ways to realize massively parallel computations in R. Examples include the packages `snow`, `multicore`, `foreach`, `Rmpi`, `Rth`, `gputools`, and the base package `parallel`, which has merged a few of these formerly separate packages. Here's a [good reference](https://books.google.com/books/about/Parallel_Computing_for_Data_Science.html?id=SsbECQAAQBAJ) about the state of the art on this topic. – RHertel May 20 '16 at 03:53
  • "For example, OP's problem can not benefit from this form, becomes it is memory bound. Setting up multiple threads will cause slow down." Why? Wouldn't it be a standard task in parallel programming to split the matrix into separate chunks and let each one be treated by a separate thread? There is no need to communicate between the threads in this case. This looks to me like a standard example of an embarrassingly parallel situation, which can be treated easily with R. – RHertel May 20 '16 at 04:36
  • No problem. We can leave it at that if you like. I don't want to start a dispute, I just don't see your point in this case. This looks to me like a quite different situation compared to the Cholesky decomposition that you mentioned. – RHertel May 20 '16 at 04:39
4

The question is slightly confusing because min() is vectorized. In order to obtain the desired result in this specific case, it is however not necessary to use this function. Logical subsetting provides a probably more efficient (and certainly more compact) alternative.

If I understood your desired output correctly, a modification of the matrix as performed by the nested loops in your code can be achieved with a single command:

temp_mat[temp_mat > 1] <- 1

Hope this helps.


set.seed(123)
temp_mat <- matrix(2*runif(50),10)
#> temp_mat
#           [,1]       [,2]      [,3]       [,4]      [,5]
# [1,] 0.5751550 1.91366669 1.7790786 1.92604847 0.2856000
# [2,] 1.5766103 0.90666831 1.3856068 1.80459809 0.8290927
# [3,] 0.8179538 1.35514127 1.2810136 1.38141056 0.8274487
# [4,] 1.7660348 1.14526680 1.9885396 1.59093484 0.7376909
# [5,] 1.8809346 0.20584937 1.3114116 0.04922737 0.3048895
# [6,] 0.0911130 1.79964994 1.4170609 0.95559194 0.2776121
# [7,] 1.0562110 0.49217547 1.0881320 1.51691908 0.4660682
# [8,] 1.7848381 0.08411907 1.1882840 0.43281587 0.9319249
# [9,] 1.1028700 0.65584144 0.5783195 0.63636202 0.5319453
#[10,] 0.9132295 1.90900730 0.2942273 0.46325157 1.7156554
temp_mat[temp_mat > 1] <- 1
#> temp_mat
#           [,1]       [,2]      [,3]       [,4]      [,5]
# [1,] 0.5751550 1.00000000 1.0000000 1.00000000 0.2856000
# [2,] 1.0000000 0.90666831 1.0000000 1.00000000 0.8290927
# [3,] 0.8179538 1.00000000 1.0000000 1.00000000 0.8274487
# [4,] 1.0000000 1.00000000 1.0000000 1.00000000 0.7376909
# [5,] 1.0000000 0.20584937 1.0000000 0.04922737 0.3048895
# [6,] 0.0911130 1.00000000 1.0000000 0.95559194 0.2776121
# [7,] 1.0000000 0.49217547 1.0000000 1.00000000 0.4660682
# [8,] 1.0000000 0.08411907 1.0000000 0.43281587 0.9319249
# [9,] 1.0000000 0.65584144 0.5783195 0.63636202 0.5319453
#[10,] 0.9132295 1.00000000 0.2942273 0.46325157 1.0000000
RHertel
  • 23,412
  • 5
  • 38
  • 64