I am creating a package that I hope to eventually put onto CRAN. I have coded much of the package in C++
with the help of Rcpp
and now would like to enable parallelization of this C++
code. I am using the foreach
package, however, I am open to switch to snow
or a different library if this would work better.
I started by trying to parallelize a simple function:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::vec rNorm_c(int length) {
return arma::vec(length, arma::fill::randn);
}
/*** R
n_workers <- parallel::detectCores(logical = F)
cl <- parallel::makeCluster(n_workers)
doParallel::registerDoParallel(cl)
n <- 10
library(foreach)
foreach(j = rep(n, n),
.noexport = c("rNorm_c"),
packages = "Rcpp") %dopar% {rNorm_c(j)}
*/
I added the .noexport
because without, I get the error Error in { : task 1 failed - "NULL value passed as symbol address"
. This led me to this SO post which suggested doing this.
However, I now receive the error Error in { : task 1 failed - "could not find function "rNorm_c""
, presumably because I have not followed the top answers instructions to load the function separately at each node. I am unsure of how to do this.
This SO post demonstrates how to do this by writing the C++
code inline, however, since the C++
code for my package is multiple functions, this is likely not the best solution. This SO post advises to create a local package for the workers to load and make calls to, however, since I am hoping to make this code available in a CRAN package, it does not seem as though a local package would be possible unless I wanted to attempt to publish two CRAN packages.
Any suggestions for how to approach this or references to resources for parallelization of Rcpp code would be appreciated.
EDIT:
I used the above function to create a package called rnormParallelization
. In this package, I also included a couple of R functions, one of which made use of the snow
package to parallelize a for loop using the rNorm_c
function:
rNorm_samples_for <- function(num_samples, length){
sample_mat <- matrix(NA, length, num_samples)
for (j in 1:num_samples){
sample_mat[ , j] <- rNorm_c(length)
}
return(sample_mat)
}
rNorm_samples_snow1 <- function(num_samples, length){
clus <- snow::makeCluster(3)
snow::clusterExport(clus, "rNorm_c")
out <- snow::parSapply(clus, rep(length, num_samples), rNorm_c)
snow::stopCluster(clus)
return(out)
}
Both functions work as expected:
> rNorm_samples_for(2, 3)
[,1] [,2]
[1,] -0.82040308 -0.3284849
[2,] -0.05169948 1.7402912
[3,] 0.32073516 0.5439799
> rNorm_samples_snow1(2, 3)
[,1] [,2]
[1,] -0.07483493 1.3028315
[2,] 1.28361663 -0.4360829
[3,] 1.09040771 -0.6469646
However, the parallelized version works considerably slower:
> microbenchmark::microbenchmark(
+ rnormParallelization::rNorm_samples_for(1e3, 1e4),
+ rnormParallelization::rNorm_samples_snow1(1e3, 1e4)
+ )
Unit: milliseconds
expr min lq
rnormParallelization::rNorm_samples_for(1000, 10000) 217.0871 249.3977
rnormParallelization::rNorm_samples_snow1(1000, 10000) 1242.8315 1397.7643
mean median uq max neval
320.5456 285.9787 325.3447 802.7488 100
1527.0406 1482.5867 1563.0916 3411.5774 100
Here is my session info:
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rnormParallelization_1.0
loaded via a namespace (and not attached):
[1] microbenchmark_1.4-7 compiler_4.1.1 snow_0.4-4
[4] parallel_4.1.1 tools_4.1.1 Rcpp_1.0.7