I have an Rcpp-function that outputs a large matrix which I want to save as an R object. My idea was to speed things up using my Rcpp function in parallel with the package foreach.
For the same matrix size, using foreach takes about more than five times as long on my Windows machine as just running the function without foreach (exclduing the set up of the workers). I am aware of the issues surrounding the execution of very small tasks in parallel (e.g. Why is the parallel package slower than just using apply?). Also I am willing to leave aside theoretical problems with running Random Number Generators in parallel as the results might no longer be truly random.
As my subtasks should be large enough, apparently the Rcpp-function I wrote does not work well in parallel, but I do not know why. Is using an RNG in the Rcpp-function simply a task that cannot be parallelized? Apart from that: is there an optimal i and with that an optimal ncol (here n_bootstrap)of my submatrices in foreach? Any help is much aprreciated. Also please feel free to comment on the code in general if you like.
Clarification: I compiled a package and use mypackage::funC within foreach
Here is an example code in R:
y <- funC(n_bootstrap = 250, n_obs_censusdata = 300000,
locationeffects = as.numeric(1:200),
residuals = as.numeric(1:20000),
X = matrix(as.numeric(1:3000000), ncol = 10),
beta_sample = matrix(as.numeric(1:2500), ncol = 250))
in parallel:
no_cores <- parallel::detectCores() - 2
cl <- parallel::makeCluster(no_cores)
doParallel::registerDoParallel(cl)
y <- foreach(i=1:5, .combine = "cbind") %dopar% {
funC(n_bootstrap = 50,
n_obs_censusdata = 300000, locationeffects = as.numeric(1:200),
residuals = as.numeric(1:20000),
X = matrix(as.numeric(1:3000000), ncol = 10),
beta_sample = matrix(as.numeric(1:2500), ncol = 250))
}
parallel::stopCluster(cl)
added: with bigstatsr
y <- bigstatsr::FBM(nrow = 300000, ncol = 250, type = "double")
bigstatsr::big_apply(y, a.FUN = function(y, ind, fun) {
y[, ind] <- fun(n_bootstrap = length(ind),
n_obs_censusdata = 300000,
locationeffects = as.numeric(1:200),
residuals = as.numeric(1:20000),
X = matrix(as.numeric(1:3000000), ncol = 10),
beta_sample = matrix(as.numeric(1:2500), ncol = 250))
NULL
}, a.combine = 'c', ncores = bigstatsr::nb_cores(), fun = funC)+
here is the Rcpp-code:
// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; indent-tabs-mode: nil; -*-
#include <RcppEigen.h>
#include <random>
using namespace Rcpp;
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
SEXP funC(const int n_bootstrap,
const int n_obs_censusdata,
const Eigen::Map<Eigen::VectorXd> locationeffects,
const Eigen::Map<Eigen::VectorXd> residuals,
const Eigen::Map<Eigen::MatrixXd> X,
const Eigen::Map<Eigen::MatrixXd> beta_sample)
{
// --------- create random sample of locations and of residuals --------- //
// initialise random seeds
std::random_device rd; // used to obtain a seed for the number engine
std::mt19937 gen(rd()); // Mersenne Twister engine
// initialize distributions for randam locations and residuals
const int upperlocation = locationeffects.size();
const int upperresiduals = residuals.size();
std::uniform_int_distribution<> distrloc(1, upperlocation);
std::uniform_int_distribution<> distrres(1, upperresiduals);
// initialize and fill matrix for randam locations and residuals
Eigen::MatrixXd LocationEffectResiduals(n_obs_censusdata, n_bootstrap);
for (int i=0; i<n_obs_censusdata; ++i)
for (int j=0; j<n_bootstrap; j++)
LocationEffectResiduals(i,j) = locationeffects[distrloc(gen)-1] + residuals[distrres(gen)-1]; // subtract 1 because in C++ indices start with 0
// ----- create Xbeta ------- //
Eigen::MatrixXd Xbeta = X * beta_sample;
// ----- combine results ------- //
Eigen::MatrixXd returnmatrix = Xbeta + LocationEffectResiduals;
return Rcpp::wrap(returnmatrix);
}