2

I want to seed R's internal unif_rand() in a multithreaded environment. The code below generates a 2-column matrix of uniform random numbers within 2 threads. The results are interesting.

struct mtRunif: public RcppParallel::Worker
{
  int Nrow; // number of rows in matrix.
  double *v; // point to the 0th element of the 0th column.
  void operator() (std::size_t st, std::size_t end)
  {
    // st = 0 in the 0th thread, 1 in the 1st thread. 
    double *vst = v + st * Nrow;
    for(int i = 0; i < Nrow; ++i)
    {
      vst[i] = unif_rand();
    }
  }


  mtRunif(int Nrow, double *v): Nrow(Nrow), v(v)
  {
    RcppParallel::parallelFor(0, 2, *this);
  }
};


// [[Rcpp::export]] 
NumericMatrix testSeeding(int sampleSize)
{
  NumericMatrix rst(sampleSize, 2);
  mtRunif(sampleSize, &*rst.begin());
  return rst;
}


/***R
N = 100
set.seed(42); tmp = testSeeding(N) 
set.seed(42); tmp2 = testSeeding(N) 
# see if sequences are identical
range(tmp[, 1] - tmp2[, 1]); range(tmp[, 2] - tmp2[, 2])
# [1] 0 0
# [1] 0 0


N = 1000
set.seed(42); tmp = testSeeding(N) 
set.seed(42); tmp2 = testSeeding(N) 
range(tmp[, 1] - tmp2[, 1]); range(tmp[, 2] - tmp2[, 2])
# [1] -0.9655154  0.8989870
# [1] -0.969356  0.963239
*/

The results suggest set.seed() controls the randomness in all threads for small sample sizes? Initially I expected set.seed() would be effective in no more than 1 thread. I do not want to exploit the conclusion because it could be absolutely wrong. On the other hand, is there a seeding function for unif_rand() akin to std::srand() for std::rand()?

Thank you!

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
user2961927
  • 1,290
  • 1
  • 14
  • 22

2 Answers2

4

In short: you can't do this with R for R-internal reasons, and that has been documented extensively.

There are also statistical issues about RNGs and streams. So you most likely want to look into "streaming RNGs" suitable for drawing from multiple threads. There are some on CRAN

as well as the older sprng that is no longer on CRAN.

Steve Walsh
  • 127
  • 1
  • 7
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Addition: The PCG family included in [dqrng](https://cran.r-project.org/package=dqrng) also supports multiple streams. And Xoroshiro128+ and Xoshiro256+ do too, though I haven't included that yet in the C++ version. – Ralf Stubner Jun 04 '18 at 20:03
  • Yes, sorry, Ralf, forgot about the new package. Send me a PR for the HPC task view and I add it there. – Dirk Eddelbuettel Jun 04 '18 at 20:14
2

After advertising dqrng in the comments I realized that I had not written any documentation on how to use the RNGs from that package for parallel usage. So I started a new vignette, that will be part of the next release. Here one of the examples, which is quite similar to what you were trying to do:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <pcg_random.hpp>
#include <dqrng_distribution.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
// [[Rcpp::plugins(cpp11)]]

struct RandomFill : public RcppParallel::Worker {
  RcppParallel::RMatrix<double> output;
  uint64_t seed;
  dqrng::normal_distribution dist{0.0, 1.0};

  RandomFill(Rcpp::NumericMatrix output, const uint64_t seed) : output(output), seed(seed) {};

  void operator()(std::size_t begin, std::size_t end) {
    pcg64 rng(seed, end); // ctor with seed and stream id
    auto gen = std::bind(dist, rng);
    std::generate(output.begin() + begin * output.nrow(),
                  output.begin() + end * output.nrow(),
                  std::ref(gen));
  }
};

// [[Rcpp::export]]
Rcpp::NumericMatrix parallel_random_matrix(const int n, const int m, const int ncores) {
  Rcpp::NumericMatrix res(n, m);
  RandomFill randomFill(res, 42);
  RcppParallel::parallelFor(0, m, randomFill, m/ncores + 1);
  return res;
}

/*** R
res <- parallel_random_matrix(1e6, 8, 4)
head(res)
*/

Result:

> res <- parallel_random_matrix(1e6, 8, 4)

> head(res)
           [,1]        [,2]        [,3]       [,4]       [,5]       [,6]       [,7]       [,8]
[1,]  0.7114429 -0.19759808 -0.47149983  0.6046378 -0.3709571 -0.8089533  0.8185977 0.49010575
[2,]  0.8721661 -0.47654248  1.10411136 -1.6290995 -1.3276661 -0.2585322 -1.2437521 0.90325167
[3,] -1.4959624  0.61068373 -0.54343828 -0.4623555 -1.1779352 -2.8068283 -0.4341252 1.74490995
[4,]  0.5087201 -0.05175746  0.19007581 -0.7869679  0.9672267 -0.5009787 -0.5283977 1.42487290
[5,] -0.8191448 -0.77348120 -0.03458304  0.7243224  1.0594094 -0.6951184 -0.5456669 0.00894037
[6,]  1.2289518 -2.33539762  0.40222707 -2.3346460 -0.5796549 -0.3092356  2.8961294 0.16773085

BTW, please do not sue std::rand(). If you want to use the standard library, then please use something like std::mt19937 from random with C++11.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • Thank you for the answer and package document. My intention was to obtain the same sequence given the same seed in multiple threads, and if I understand correctly, your example tries to achieve the opposite? Is there any way to control the sequence as I intended? – user2961927 Jun 09 '18 at 19:56
  • @user2961927 Sure, just use the standard ctor with just the seed. Then each thread will get the same sequence of random numbers. – Ralf Stubner Jun 09 '18 at 21:12