I have an R script which compiles C++ code via sourceCpp("prog.cpp")
and then calls the function go
that is exported from prog.cpp
. This C++ code then makes quite a few calls back to R and, (after quite a long time) then finally returns the result.
I think I should start making use of the fact that my laptop has 4 cores. I want to parallelize things. However, before running into unexpected problems, might I ask what is supported and what is not?
The task can be approached in a few ways:
- (This is what I would like to do if possible) Call
clusterApply
in R. The function that isclusterApplied
will then call this C++ function, which means this function will be called 4 times in parallel.- Will all 4 instances of this C++ function be isolated from each other?
- In particular, will global variables used by
prog.cpp
come in 4 isolated instance or just one instance? (Don't throw rocks at me... I know globals should be best avoided) - Will I run into problems when the C++ code calls an R function which will then call a function from a compiled package from CRAN?
- If not: Would calling
sourceCpp("prog.cpp")
insideclusterApply
help? (compilation time is negligible in comparison to the long time necessary forgo
to return)
- (From what I read this is not going to work, but let's ask about this anyway for completeness' sake): Can I call
go
from R code only once (as I'm doing now) and create 4 threads inside the C++ code?- I noticed compiled packages from CRAN tend to not do this, even if the tasks are computationally expensive - this makes me suppose doing this might not be supported
- In particular, will I run into problems when the threaded C++ code calls back to R? (If this matters, the R function called by the C++ code will then call a function from a compiled package from CRAN)
I Googled and I know there exists such a thing as RcppParallel. However, quoting their main page:
API Restrictions
The code that you write within parallel workers should not call the R or Rcpp API in any fashion.
Then I suppose I can't use RcppParallel because, as I said, my C++ code calls R many times (and time spent in these calls is comparable to time spent in C++ so I'd dearly like to parallelize them; as clusterApply
would allow me.