Overview:
I'm interested in parallelizing (across chains) a Gibbs sampler for a non-trivial regression problem that I've already implemented in serial via Rcpp/RcppEigen. I've read the documentation for RcppParallel
and RcppThread
and I want to know if my understanding of the challenges involved in parallelizing this code are accurate and if my proposed pseudocode using RcppThread
is viable.
Programming Challenge:
This regression problem requires inverting an updated design matrix each iteration of the Gibbs sampler. Consequently any new matrix (one per chain) needs to be "thread safe". That is, there is no danger of one thread writing to memory that another thread might also try to access. If this is done, I can then draw and store the regression coefficient samples (beta) by giving the Rcpp::parallelFor
a unique index with which to assign the samples. I'm wondering where/how would be best to initialize these thread specific matrices?. See below for my overall conceptual understanding and first guess at how I could essentially use the sample principle of assigning samples in parallel, to assign X's in parallel. Note This is assuming that Eigen objects are okay with concurrent index access in the same way I've seen std::vector<>'s memory accessed in the RcppThread
documentation.
#include "RcppEigen.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppThread)]]
// [[Rcpp::depends(RcppEigen)]]
// Sampler class definition
#include "Sampler.h"
#include "RcppThread.h"
// [[Rcpp::export]]
Eigen::ArrayXXd fancyregression(const Eigen::VectorXd &y, // outcome vector
const Eigen::MatrixXd &Z, // static sub-matrix of X
const int &num_iterations,
const int &num_chains_less_one,
const int &seed,
...)
{
std::mt19937 rng;
rng(seed);
const int dim_X = get_dim_X(Z,...);
const int n = y.rows();
const int num_chains = num_chains_less_one + 1;
Eigen::ArrayXXd beta_samples;
beta_samples.setZero(num_iterations,num_chains*dim_X);
Eigen::MatrixXd shared_X(n,dim_X*num_chains);
// sampler object only has read access to its arguments
SamplerClass sampler(y,Z,...);
//chain for loop
RcppThread::parallelFor(0, num_chains_less_one,[&beta, &shared_X, &n,&sampler, &dim_X, &rng](unsigned int chain){
// chain specific iteration for loop
for(unsigned int iter_ix = 0; iter_ix < num_iterations ; iter_ix ++){
X.block(0,dim_X*chain,n,dim_X) = sampler.create_X(rng);
beta_samples(iter_ix,dim_X*chain) = sampler.get_beta_sample(X,rng);
}
});
return(beta_samples);
}