29

Let me say first that I've read Writing R Extensions, the Rcpp package vignette, and that I've built a package from Rcpp.package.skeleton().

Since building my package, I added a function, multiGenerateCSVrow(), and then ran compileAttributes() on the package directory before R CMD build/R CMD install. After I load my package, I can run my function either directly or via foreach() with the %do% method.

When I try to run in parallel however, I get an error:

cl <- makePSOCKcluster(8)                                                                                     
registerDoParallel(cl)                                                                                        
rows <- foreach(i=1:8,.combine=rbind,.packages="myPackage") %dopar% multiGenerateCSVrow(scoreMatrix=NIsample,   
                                                                   validMatrix = matrix(1,nrow=10,ncol=10),   
                                                                   cutoffVector = rep(0,10),                  
                                                                   factorVector = randomsCutPlus1[i,],        
                                                                   actualVector = rep(1,10),                  
                                                                   scaleSample = 1)                           
stopCluster(cl)                                                                                               
~                                                                                                             

Error in multiGenerateCSVrow(scoreMatrix = NIsample, validMatrix = matrix(1,  : 
  task 1 failed - "NULL value passed as symbol address"

Here's the package NAMESPACE:

# Generated by roxygen2 (4.0.1): do not edit by hand 
useDynLib(myPackage)                                   
exportPattern("^[[:alpha:]]+")                       
importFrom(Rcpp, evalCpp) 

Here's the relevant chunk of RcppExports.cpp:

// multiGenerateCSVrow
SEXP multiGenerateCSVrow(SEXP scoreMatrix, SEXP validMatrix, SEXP cutoffVector, SEXP factorVector, SEXP actualVector, SEXP scaleSample);
RcppExport SEXP myPackage_multiGenerateCSVrow(SEXP scoreMatrixSEXP, SEXP validMatrixSEXP, SEXP cutoffVectorSEXP, SEXP factorVectorSEXP, SEXP actualVectorSEXP, SEXP scaleSampleSEXP) {
BEGIN_RCPP
    SEXP __sexp_result;
    {
        Rcpp::RNGScope __rngScope;
        Rcpp::traits::input_parameter< SEXP >::type scoreMatrix(scoreMatrixSEXP );
        Rcpp::traits::input_parameter< SEXP >::type validMatrix(validMatrixSEXP );
        Rcpp::traits::input_parameter< SEXP >::type cutoffVector(cutoffVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type factorVector(factorVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type actualVector(actualVectorSEXP );
        Rcpp::traits::input_parameter< SEXP >::type scaleSample(scaleSampleSEXP );
        SEXP __result = multiGenerateCSVrow(scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample);
        PROTECT(__sexp_result = Rcpp::wrap(__result));
    }
    UNPROTECT(1);
    return __sexp_result;
END_RCPP
}

And RcppExports.R:

multiGenerateCSVrow <- function(scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample) {
    .Call('myPackage_multiGenerateCSVrow', PACKAGE = 'myPackage', scoreMatrix, validMatrix, cutoffVector, factorVector, actualVector, scaleSample)
}   

What could it be looking for?

Patrick McCarthy
  • 2,478
  • 2
  • 24
  • 40
  • Does the cluster span several machines? Did you install the updated package on the other machines? – Dirk Eddelbuettel Jul 31 '14 at 18:42
  • Nope, one machine, run locally. – Patrick McCarthy Jul 31 '14 at 18:43
  • 1
    Check if the slaves can find other packages etc. At the end of the day, these are "just" other R processes, so make sure your path and settings are fine. – Dirk Eddelbuettel Jul 31 '14 at 18:45
  • 1
    I extended the .packages vector to include "Rcpp" and the other package dependencies, but no change. Is there a way I can log into the other R threads or somehow interact with them directly? – Patrick McCarthy Jul 31 '14 at 18:50
  • 2
    I'd love to help you here but little nothing to go on. To me, you are "merely" having issues with a parallel processing setup, so I would recommend reading the vignette of the package "parallel" which came with your copy of R. – Dirk Eddelbuettel Aug 01 '14 at 13:42
  • I up voted as I'm just experiencing the same issue. – JAponte Dec 12 '14 at 15:58
  • I don't have the setup to reproduce my issue, but I recently solved something similar by passing in the `Rcpp` package to foreach along with my package. I was defining an `RNGScope` in my function for `runif`, but my package didn't explicitly depend upon or call`Rcpp` for some reason. – Patrick McCarthy Dec 12 '14 at 16:42
  • ... and apropos nothing (but for other lost souls) I managed to trigger the same error message via doing some ugly things with an Rcpp object and environments; but nothing I've been able to come up with a minimal example of. I mitigated my error by not re-creating extra copies of Rcpp objects in daughter environments. – russellpierce Jan 26 '16 at 02:23
  • p1 as I encounter the same issues. Any updates on this problem? – Matthias Schmidtblaicher Nov 16 '16 at 10:10
  • Hello. I have an Rcpp function that I need to put into a package so that I can run it the same way you did regarding `multiGenerateCSVrow`. Do you have any tutorials on this? – 89_Simple Jul 19 '20 at 23:59
  • I haven't thought about this in some time, but you should probably start with the authorities - https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-package.pdf – Patrick McCarthy Jul 20 '20 at 17:15

4 Answers4

15

I had a similar problem and I solved it by adding .noexport = c(<Functions that were implemented in C++>) to the foreach.

I am guessing these functions get imported from the global environment into the parallel contexts, but, since they are not ordinary functions, they don't actually work. This does mean the functions have to be loaded separately on each node; in my case that was a SNOW clusterCall() call that sourced various files including the C++ code.

henine
  • 151
  • 1
  • 2
11

I also had the problem that functions using Rcpp would not work within foreach. As suggested by Patrick McCarthy, I put the function in a package, installed&loaded the package and passed it in forearch with .packages=("...").

I still got some errors, but that was resolved after updating all involved packages.

(I would have commented, but I do not have enough reputation and I thought this might be helpful for some people)

jmb
  • 625
  • 7
  • 11
7

Inspired by answers from @henine & @jmb, I tried the "reverse" option, which is that I actually source my R file with the Rccp functions inside my foreach loop and make sure to include "Rccp" in the .packages option of foreach. Might not be the most efficient, but does the job & is simple.

Something like:

cl = makeCluster(n_cores, outfile="")
registerDoParallel(cl)

foreach(n = 1:N,.packages = "Rcpp",.noexport = "<name of Rccp function>")%dopar%{
  source("Scripts/Rccp_functions.R")
  ### do stuff with functions scripted in Rccp_functions.R
}

stopImplicitCluster()

And similarly to @jmb, I would have commented, but don't have enough reputation :D

LaSy
  • 131
  • 1
  • 3
  • 1
    I used this Method for long time and finally managed to try out the "proper" method by using the cpp-File inside a Package. I was disapointed to not get ANY increase in speed :( – Squeezie Nov 08 '19 at 14:32
  • you can also wrap the cpp function with an R function that call `sourceCpp` first, based on the existence of the global cpp function object. stylistically, seems cleaner than having source in the loop body, but same basic idea – Ethan Nov 25 '21 at 16:23
0

Hi I met this problem before, the solution for me is:

within your function (which you used to run the loops), write

library(Rcpp)
sourceRcpp('<the path to your cpp file>')

before calling that function. It works for me and is still quick.

Zoey
  • 1