2

Is it possible to apply function to each cell in a DataFrame/matrix multithreadedly in R?

I'm aware of apply() but it doesn't seem to allow multithreading natively:

x <- cbind(x1 = 3, x2 = c(4:1, 2:5))

cave <- function(x, c1, c2) {
  a = 1000
  for (i in 1:100) { # Useless busy work
    b=matrix(runif(a*a), nrow = a, ncol=a)
  }
  c1 + c2 * x      
}

apply(x, 1, cave,  c1 = 3, c2 = 4)

returns:

   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
x1   15   15   15   15   15   15   15   15
x2   19   15   11    7   11   15   19   23

Instead, I would like to use more than one core to perform the operation, since the applied function may be complex. For example, one can apply a function to each cell in DataFrame multithreadedly in pandas.

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

1 Answers1

2

There are probably a few ways to do this, but I've always found it easiest to run parallel operations on list objects. If you convert the input matrix to a list, the function can be applied using parallel::parLapply as follows:

## convert the input object to a list
x.list <- split(t(x), rep(1:nrow(x), each = ncol(x)))

## parallelize the operation over e.g. 2 cores
cl <- parallel::makeCluster(2)
out <- parallel::parLapply(cl, x.list, cave, c1 = 3, c2 = 4)
parallel::stopCluster(cl)

## transform the output list back to a matrix
out <- t(matrix(unlist(out, use.names = FALSE), nrow = ncol(x)))
colnames(out) <- colnames(x)

This should work across platforms.

> x
     x1 x2
[1,]  3  4
[2,]  3  3
[3,]  3  2
[4,]  3  1
[5,]  3  2
[6,]  3  3
[7,]  3  4
[8,]  3  5
> out
     x1 x2
[1,] 15 19
[2,] 15 15
[3,] 15 11
[4,] 15  7
[5,] 15 11
[6,] 15 15
[7,] 15 19
[8,] 15 23
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
Shaun Wilkinson
  • 473
  • 1
  • 4
  • 11