Parallelize nested for-loop on 3 dimensional array in R

Question

Using R on a Windows machine, I am currently running a nested loop on a 3D array (720x360x1368) which cycles through d1 and d2 to apply a function over d3 and assemble the output to a new array of similar dimensionality.

In the following reproducible example, I have reduced the dimensions by factor 10, to make execution faster.

library(SPEI)

old.array = array(abs(rnorm(50)), dim=c(72,36,136))

new.array = array(dim=c(72,36,136))

for (i in 1:72) {
  for (j in 1:36) {
    new.listoflists <- spi(ts(old.array[i,j,], freq=12, start=c(1901,1)), 1, na.rm = T)
    new.array[i,j,] = new.listoflists$fitted
  }
}

where spi() is a function from the SPEI package returning a list of lists, of which one particular list $fittedof length 1368 is used from each loop increment to cunstruct the new array.

While this loop works flawlessly, it takes quite a long time to compute. I have read that foreachcan be used to parallelize for loops.

However, I do not understand how the nesting and the assembling of the new array can be achieved such that the dimnames of the old and the new array are consistent.

(In the end, what I want to be able to, is to transform both the old and the new array into a "flat" long panel data frame using as.data.frame.table() and merge them along their three dimensions.)

Any help on how I can achieve the desired output using parallel computing will be highly appreciated!

Cheers
CubicTom

I can't really tell what you want to do ([reproducible example!](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)) but have you looked into using `apply()` which is generalisable across n-dimensional arrays? The second argument of `apply()` defines the dimensions to apply a function over, e.g. `apply( old.array , 1:2 , FUN = function(x) paste0(x) )` to apply the `paste0()` function across the third dimension of your array as an example. — Simon O'Hanlon, Jul 22 '16 at 22:33
Thanks Simon, I followed your advice and made my example reproducible. — CubicTom, Jul 25 '16 at 09:17
I also tried using apply: `new2.array <- apply(old.array , 1:2 , FUN = function(x) spi(ts(x, freq=12, start=c(1901,1)), 1, na.rm = T) )`, which is very fast but only gives me a matrix with 72*36=2592 elements, each of which is a list of 8. I am only interested in the second one of these 8 ($fitted), which is a list of the desired function values corresponding to each entry along d3. — CubicTom, Jul 25 '16 at 09:26
If I understand correctly then you could just do this: `new2.array <- apply(old.array , 1:2 , FUN = function(x) spi(ts(x, freq=12, start=c(1901,1)), 1, na.rm = T)$fitted )` — Simon O'Hanlon, Jul 25 '16 at 10:11
This solves the nested-lists problem! However, the dimensions do not yet come in the right order. Wrapping everything in `aperm` helps here, such that the outcome is identical to my looped example. Measuring execution time with tictoc gives me 36.51 sec on this solution compared to 39.88 sec on my looped executable example. Multiplying each dimension by 10 in the real data, this will be a significant improvement (though admittedly not quite as big as I was hoping). Thank you Simon! — CubicTom, Jul 25 '16 at 11:39

score 0 · Accepted Answer · answered Jul 22 '16 at 11:32

It would have been better with a reproducible example, here is what i come up with:

First create the cluster to use

cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)

Then you create the loop and close the cluster:

zz <- foreach(i = 1:720, .combine = c) %:% 
foreach(j = 1:360, .combine = c ) %dopar% {
new.listoflists <- FUN(old.array[i,j,])
new.array[i,j,] <- new.listoflists$list
}
stopCluster(cl)

This will create a list zz containing every iteration of new.array[i,j,], then you can bind them together with:

new.obj <- plyr::ldply(zz, data.frame)

Hope this helps you!

Thanks, tia_0 ! This helped me a lot to understand how to parallelize for-loops in my case! I thus mark this as the right answer. For interested readers: @Simon 's answer in a comment to my initial question also works like a charm. — CubicTom, Jul 26 '16 at 13:05

score 0 · Answer 2 · answered Jul 23 '16 at 07:40

I did not use as much of dimensions as your question because I wanted to ensure the behavior was correct. So here I use mapply which take multiple arguments. The result is a list of the results. Then I wrapped it with matrix() to get the dimensions you hoped for. Please note that i is repeated using times and j is repeated using each. This is critical as matrix() put entries by row first then wraps to the next column when the number of row is reached.

new.array = array(1:(5*10*4), dim=c(5,10,4))

# FUN: function which returns lists of 
FUN <- function(x){
    list(lapply(x, rep, times=3))
}

# result of the computation
result <- matrix(
    mapply(
        function(i,j,...){

            FUN(new.array[i,j,])
        }
        ,i = rep(1:nrow(new.array),times=ncol(new.array))
        ,j = rep(1:ncol(new.array),each=nrow(new.array))
        ,new.array=new.array
    )
    ,nrow=nrow(new.array)
    ,ncol=ncol(new.array)
)

Thanks for your reply! I tried to adopt your code, but stopped its execution after one hour without any result: `new3.array = array(dim=c(72,36,136)) # result of the computation new3.array <- matrix( mapply( function(i,j,...){ spi(ts(old.array[i,j,], freq=12, start=c(1901,1)), 1, na.rm = T) } ,i = rep(1:nrow(old.array),times=ncol(old.array)) ,j = rep(1:ncol(old.array),each=nrow(old.array)) ,old.array=old.array ) ,nrow=nrow(old.array) ,ncol=ncol(old.array) )` Am I missing anything here? — CubicTom, Jul 25 '16 at 11:42

Parallelize nested for-loop on 3 dimensional array in R

2 Answers2