3

Overview: I'm interested in parallelizing (across chains) a Gibbs sampler for a non-trivial regression problem that I've already implemented in serial via Rcpp/RcppEigen. I've read the documentation for RcppParallel and RcppThread and I want to know if my understanding of the challenges involved in parallelizing this code are accurate and if my proposed pseudocode using RcppThread is viable.

Programming Challenge: This regression problem requires inverting an updated design matrix each iteration of the Gibbs sampler. Consequently any new matrix (one per chain) needs to be "thread safe". That is, there is no danger of one thread writing to memory that another thread might also try to access. If this is done, I can then draw and store the regression coefficient samples (beta) by giving the Rcpp::parallelFor a unique index with which to assign the samples. I'm wondering where/how would be best to initialize these thread specific matrices?. See below for my overall conceptual understanding and first guess at how I could essentially use the sample principle of assigning samples in parallel, to assign X's in parallel. Note This is assuming that Eigen objects are okay with concurrent index access in the same way I've seen std::vector<>'s memory accessed in the RcppThread documentation.

#include "RcppEigen.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppThread)]] 
// [[Rcpp::depends(RcppEigen)]] 

// Sampler class definition
#include "Sampler.h" 
#include "RcppThread.h"

// [[Rcpp::export]]
Eigen::ArrayXXd fancyregression(const Eigen::VectorXd &y, // outcome vector
                                const Eigen::MatrixXd &Z, // static sub-matrix of X
                                const int &num_iterations,
                                const int &num_chains_less_one,
                                const int &seed,
                                ...)
{ 
   std::mt19937 rng;
   rng(seed);
   const int dim_X = get_dim_X(Z,...);
   const int n = y.rows();
   const int num_chains = num_chains_less_one + 1;

   Eigen::ArrayXXd beta_samples;
   beta_samples.setZero(num_iterations,num_chains*dim_X);

   Eigen::MatrixXd shared_X(n,dim_X*num_chains);

   // sampler object only has read access to its arguments
   SamplerClass sampler(y,Z,...);
    
   //chain for loop
    RcppThread::parallelFor(0, num_chains_less_one,[&beta, &shared_X, &n,&sampler, &dim_X, &rng](unsigned int chain){
        // chain specific iteration for loop
        for(unsigned int iter_ix = 0; iter_ix < num_iterations ; iter_ix ++){
            X.block(0,dim_X*chain,n,dim_X) = sampler.create_X(rng);
            beta_samples(iter_ix,dim_X*chain) = sampler.get_beta_sample(X,rng); 
        }
    });

    return(beta_samples);

}
Adam
  • 55
  • 7

1 Answers1

1

"where/how would be best to initialize these thread specific matrices?"

You're looking for thread specific resources. Here's a barebones example:

#include <Rcpp.h>
#include <RcppParallel.h>
using namespace Rcpp;
using namespace RcppParallel;

// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]

struct Test : public Worker {
  tbb::enumerable_thread_specific<bool> printonce;
  Test() : printonce(false) {}
  
  void operator()(std::size_t begin, std::size_t end) {
    tbb::enumerable_thread_specific<bool>::reference p = printonce.local();
    if(!p) { // print once per thread
      std::cout << 1;
      p= true;
    }
  }
};

// [[Rcpp::export(rng = false)]]
void test() {
  Test x{};
  parallelFor(0, 10000, x);
}

RcppParallel uses TBB under the hood (for the majority of operating systems) so you can use and look up anything in TBB.

Note that since it's a thread local resource has to be assigned somewhere, you'll want to use a class/functor rather than a lambda.

thc
  • 9,527
  • 1
  • 24
  • 39
  • Great. Am I right in thinking that giving each thread concurrent read only access to the sampler object's internal members won't be a problem? – Adam Oct 30 '20 at 19:13
  • 2
    Yes, reading concurrently is generally not a problem for most things, especially if you directly access through the pointer. Concurrent read access is well defined according to the C++ standard. See: https://stackoverflow.com/questions/48854896/concurrent-reads-on-non-atomic-variable – thc Oct 30 '20 at 19:36
  • @thc Hi, can you help me see this question https://stackoverflow.com/questions/65054971/how-to-debug-for-my-gibbs-sampler-of-bayesian-regression-in-r? Thank you. – Hermi Nov 30 '20 at 05:58