3

I am working on a project that requires parallel processing in R, and I am new to the doparallel package. What I would like to do is use a parallelized foreach loop. Due to the nature of the problem, this foreach loop will need to be executed many times. The problem I am having is that I use cppfunction and cfunction within the loop.

The current work around is to call clusterEvalQ() for the cluster and to compile the relevant functions. However, this is extremely slow (~10 seconds for 4 cores). I have included the relevant code below. Is there any way to speed this up? Thanks.

clusterEvalQ(cl, {
library("inline")
library("Rcpp")
source("C_functions.R")
}) 

1 Answers1

3

Yes, there is a way to speed it up by taking the compilation hit only once.

In particular, move all the compiled code into an R package. From there, install the R package onto the cluster and, then, load the package. Inside the parallel code, call the function in the package.

This is required because C++ functions imported into R are session-specific. As a result, each session requires its own compilation. The compilation is the "expensive" part.

Also, do not use the inline package. Instead, you should use Rcpp Attributes.

coatless
  • 20,011
  • 13
  • 69
  • 84
  • Thanks. Do you have any resources to help me write a package with cppfunction()? I know how to to do it for just normal R functions, but not with these added features. – philbo_baggins Jul 28 '19 at 17:21
  • 1
    @pbn990 the `*Cpp()` functions are meant for session use. To create an _R_ package using C++ code, read the last portion of the [Rcpp Introduction vignette](https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-introduction.pdf#page=7). In particular, add the compiled code to the `src/` directory, add to `Imports: Rcpp` in the `DESCRIPTION` file, and add an `importFrom(Rcpp, evalCpp)` line to `NAMESPACE`. – coatless Jul 28 '19 at 17:24