2

I'm trying to understand what is happening behind the Rcpp::sourceCpp() call on a parallelized environment. Recently, this was partially addressed in the question: Using Rcpp function in parLapply on Windows.

Within this post, Dirk said,

"You need to run the sourceCpp() call in each spawned process, or else get them your code."

This was in response to questioner's use of distributing the Rcpp function to the worker processes. The questioner was sending the Rcpp function via:

clusterExport(cl = cl, varlist = "payoff")

I'm confused as to why this doesn't work. My thoughts are that this was what the objective of the clusterExport() is for.

coatless
  • 20,011
  • 13
  • 69
  • 84
Scott White
  • 190
  • 2
  • 14

1 Answers1

7

The issue here is that the compiled code is not "exportable" to the spawned processes without being embedded in a package due to how binaries are linked into R's processes.

Traditionally, the clusterExport() statement allows for R specific code to be distributed to workers.

By using clusterExport() on an Rcpp function, you are only receiving the R declaration and not the underlying shared library. That is to say, the R CMD SHLIB given in Attributes.R is not shared with / exported to the workers. As a result, when a call is then made to an Rcpp function on the worker, R cannot find the correct shared library.

Take the previous question's function:

Rcpp::cppFunction("NumericVector payoff( double strike, NumericVector data) {
    return pmax(data - strike, 0);
}")

Note: I'm using cppFunction() instead of sourceCpp() but the results are equivalent since cppFunction() calls sourceCpp() to create the function.

Typing the function name:

payoff

Yields the R declaration with a shared library pointer.

function (strike, data) 
.Primitive(".Call")(<pointer: 0x1015ec130>, strike, data)

This shared library is only available on process that compiled the function.

Hence, why it is always ideal to embed compiled code within a package and then distribute the package.

coatless
  • 20,011
  • 13
  • 69
  • 84
  • So, if I add my one .cpp file to a package, and export it to the workers using clusterEvalQ it should resolve my issue? – Scott White Jul 22 '16 at 05:25
  • Yup, as the package handles the wrapping of the compiled code. – coatless Jul 22 '16 at 05:26
  • Thanks for clearing that up. I'm starting to look around for easy ways of making a package with a single file. Do you have a recommended resource? – Scott White Jul 22 '16 at 05:33
  • I accepted the answer, and tried to upvote but I don't have enough reputation. Thank you for your help! – Scott White Jul 22 '16 at 05:49
  • Plenty of examples of making a package with a single file in my tutorial slides. Start from `Rcpp.package.skeleton()`, preferably with pkgKitten installed, or use the package skeleton generator built into RStudio (and select "Package w/Rcpp"). Because each time you create a package using devtools $DEITY kills a kitten. Just kidding. Nice answer otherwise and to the (very important) point. Upvoted. – Dirk Eddelbuettel Jul 22 '16 at 11:37
  • Thanks for the advice you two! I was able to create my package using this link http://web.mit.edu/insong/www/pdf/rpackage_instructions.pdf. When I have more time I'll definitely give your book a read through Dirk! – Scott White Jul 24 '16 at 19:46