How to parallelise C++ code when using Rcpp?

Question

I have an R script which compiles C++ code via sourceCpp("prog.cpp") and then calls the function go that is exported from prog.cpp. This C++ code then makes quite a few calls back to R and, (after quite a long time) then finally returns the result.

I think I should start making use of the fact that my laptop has 4 cores. I want to parallelize things. However, before running into unexpected problems, might I ask what is supported and what is not?

The task can be approached in a few ways:

(This is what I would like to do if possible) Call clusterApply in R. The function that is clusterApplied will then call this C++ function, which means this function will be called 4 times in parallel.
- Will all 4 instances of this C++ function be isolated from each other?
- In particular, will global variables used by prog.cpp come in 4 isolated instance or just one instance? (Don't throw rocks at me... I know globals should be best avoided)
- Will I run into problems when the C++ code calls an R function which will then call a function from a compiled package from CRAN?
- If not: Would calling sourceCpp("prog.cpp") inside clusterApply help? (compilation time is negligible in comparison to the long time necessary for go to return)
(From what I read this is not going to work, but let's ask about this anyway for completeness' sake): Can I call go from R code only once (as I'm doing now) and create 4 threads inside the C++ code?
- I noticed compiled packages from CRAN tend to not do this, even if the tasks are computationally expensive - this makes me suppose doing this might not be supported
- In particular, will I run into problems when the threaded C++ code calls back to R? (If this matters, the R function called by the C++ code will then call a function from a compiled package from CRAN)

I Googled and I know there exists such a thing as RcppParallel. However, quoting their main page:

API Restrictions

The code that you write within parallel workers should not call the R or Rcpp API in any fashion.

Then I suppose I can't use RcppParallel because, as I said, my C++ code calls R many times (and time spent in these calls is comparable to time spent in C++ so I'd dearly like to parallelize them; as clusterApply would allow me.

I believe that when using `parallel` and/or `future` packages, your use of multiple processes (via fork or new processes) are all relatively protected and can make calls to underlying DLLs quite freely without risk. (There is no need to discuss thread-safe topics here.) In this workflow, you *will* need to call `sourceCpp` in each, typically with something like `parallel::clusterEvalQ(cl, { sourceCpp("prog.cpp"); })` (untested). Alternatively, you can compile it into a persistent/permanent DLL and just attach it directly, such as in a local package (which is recommended for other reasons). — r2evans, May 20 '19 at 18:09
BTW: if you are truly trying to speed things up, calling R from Rcpp typically imposes overhead and is discouraged *in general*. If you search for @DirkEddelbuetel's posts on the subject, you'll see clear commentary/guidance. Lacking seeing the actual source code, it's difficult to quantify how much this might hurt your process, but ... it can be quite "expensive". I know it's not always easy to refactor those R calls out of your code, but it might improve performance significantly. — r2evans, May 20 '19 at 18:16
@r2evans Thank you for your comments. "*In this workflow, you will need to call sourceCpp in each*" - No problems here, but doesn't `sourceCpp` execute something along the lines of `g++ -c prog.cpp -o prog.o` - will two `prog.o`'s from two parallelized `sourceCpp`'s not clash? "*calling R from Rcpp typically imposes overhead and is discouraged in general.*" - The alternative is to have the C++ code return objects that the R code should pass as arguments to the package from CRAN; then R would have to inform C++ about the returned values, and so on, in a loop. (cond) — , May 20 '19 at 18:42
@r2evans (cond) The whole purpose of the C++ code is to construct arguments to pass to the package from CRAN, which is surprisingly non-trivial. I guess I *could* do the same work in R as I'm doing in C++, but practical considerations were dominating here: I just know C++ far better than R, so it is far faster for me to do any non-trivial work in C++ than in R... so I'm simply moving to C++ all code I can move... I guess this is not necessarily optimal, but... — , May 20 '19 at 18:47
I believe I understand what you are saying, and I can commiserate on language-of-comfort. However, R is not the speediest in some senses, and crossing from Rcpp to R is very expensive. If "all you are doing" (emphasis on the quotes) is constructing arguments, then I argue this might be a *fantastic* time to hone some native-R skills (I know that this is very easily said when I have no idea of the constraints/requirements of this task). — r2evans, May 20 '19 at 18:59
@r2evans Problem is, *constructing arguments* is itself non-trivial. One call to the CRAN package takes 0.5 secs, approximately; *time spent in C++ (in between of calls to R) rivals that*. Since there are many calls, the execution of the whole program takes *hours* if the number of iterations is set just to prove the concept; I'm going to leave this working for a few days in a row once I'm done developing. — , May 20 '19 at 19:06
@gaazkam what you are using is `snow` via the `parallel` package for the `makeCluster()`, `clusterApply()`, ... . Thus, the advice is the same between posts: You must compile the _C++_ per process or distribute the _C++_ code within an _R_ package. — coatless, May 20 '19 at 21:51
(note: OP deleted comment questioning why I made the recommendation to close as duplicate.) — coatless, May 20 '19 at 21:52

score 2 · Answer 1 · answered May 20 '19 at 18:43

When you use clusterApply you are actually using (in your case) 4 different R processes. So yes, the C++ functions, any global variables, etc. will be separate. Even calling back o R from C++ is safe, since each C++ function has is own R process to communicate with. It goes even further: You should call sourceCpp via clusterApply, since otherwise the different R processes won't have the C++ function to call in the first place. Alternative would be building a package. Parallelizing within C++ (via RcppParallel, OpenMP or std::thread) is not possible in your case, since you want to call back to R from C++. BTW, I would try to get rid of these call-backs if possible.

While your firs approach should work in principle, it is unclear that you will get much performance gain, since parallel computation comes with its own set of cavetas (memory consumption, communication overhead ...)

How to parallelise C++ code when using Rcpp?

API Restrictions

1 Answers1

Linked