2

This question is twofold. I am translating an R script into C++ that uses the L'Ecuyer combined multiple recursive generator (CMRG) as it's engine (in particular, MRG32k3a), which then returns a random number from the uniform distribution over the interval (0, 1). A minimal example in R is shown below:

seednum<-100                              # set seed
set.seed(seednum, kind="L'Ecuyer-CMRG")   # set RNG engine
runif(1)                                  # set distribution

I want to be able to validate my results between the R script and the C++ code (as the random numbers generated are only the beginning). I have found that PRNG's with the same seeds across different languages do not necessarily produce the same result (since they may have parameters that the compiler is free to specify) as seen in the SO posts here and here. That is to say, using the same seed, the same engine, and the same distribution may result in different random numbers depending on the particular implementation of the PRNG. A pertinent example between R and C++11 is below. Using the ubiquitous Mersenne-Twister PRNG in R:

seednum<-100
set.seed(seednum, kind="Mersenne-Twister")
runif(1)

Results in a random number of 0.3077661. Doing the same thing in C++11:

#include <iostream>
#include <random>

int main()
{
  unsigned seed = 100;

  std::mt19937 generator (seed);

  std::uniform_real_distribution<double> distribution (0.0, 1.0);

  std::cout << distribution(generator) << std::endl;

  return 0;
}

Results in a random number of 0.671156. I was originally confused over this result, but previous SO questions clarified this for me (as linked above). It would appear that there are parameters being passed to MRG32k3a in R that I need to replicate in C++ in order to generate the same random numbers. The first question is thus, where can I find the documentation on the MRG32k3a implementation in R that specifies these parameters?

The second question regards implementing this generator in C++11. This generator does not appear in the list of pre-configured engine types within the <random> library of C++11 listed here. An example of MRG32k3a being implemented in C can be found here and is shown below:

/*
   32-bits Random number generator U(0,1): MRG32k3a
   Author: Pierre L'Ecuyer,
   Source: Good Parameter Sets for Combined Multiple Recursive Random
           Number Generators,
           Shorter version in Operations Research,
           47, 1 (1999), 159--164.
   ---------------------------------------------------------
*/
#include <stdio.h>

#define norm 2.328306549295728e-10
#define m1   4294967087.0
#define m2   4294944443.0
#define a12     1403580.0
#define a13n     810728.0
#define a21      527612.0
#define a23n    1370589.0

/***
The seeds for s10, s11, s12 must be integers in [0, m1 - 1] and not all 0. 
The seeds for s20, s21, s22 must be integers in [0, m2 - 1] and not all 0. 
***/

#define SEED 100

static double s10 = SEED, s11 = SEED, s12 = SEED,
              s20 = SEED, s21 = SEED, s22 = SEED;


double MRG32k3a (void)
{
   long k;
   double p1, p2;
   /* Component 1 */
   p1 = a12 * s11 - a13n * s10;
   k = p1 / m1;
   p1 -= k * m1;
   if (p1 < 0.0)
      p1 += m1;
   s10 = s11;
   s11 = s12;
   s12 = p1;

   /* Component 2 */
   p2 = a21 * s22 - a23n * s20;
   k = p2 / m2;
   p2 -= k * m2;
   if (p2 < 0.0)
      p2 += m2;
   s20 = s21;
   s21 = s22;
   s22 = p2;

   /* Combination */
   if (p1 <= p2)
      return ((p1 - p2 + m1) * norm);
   else
      return ((p1 - p2) * norm);
}

int main()
{
   double result = MRG32k3a();

   printf("Result with seed 100 is: %f\n", result);

   return (0);
}

As previously noted, I need to use this generator to create an engine that can be fed into the uniform real distribution. The problem is I have no idea how this is done and I can't seem to find any information anywhere (aside from knowing that engines are classes). Are there any C++11 resources available that might help me in such a task? I am not asking for a solution to the problem, but rather pointers that would help me in implementing this myself.

Leigh K
  • 561
  • 6
  • 20

2 Answers2

4

The first question is thus, where can I find the documentation on the MRG32k3a implementation in R that specifies these parameters?

I would use the source: https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/main/RNG.c#L143

The problem is I have no idea how this is done and I can't seem to find any information anywhere (aside from knowing that engines are classes).

The requirements for a RandomNumberEngine can be found here: https://en.cppreference.com/w/cpp/named_req/RandomNumberEngine Although it is sufficient to fulfill UniformRandomBitGenerator if you want to use uniform_real_distribution:

Expression      Return type     Requirements
G::result_type  T               T is an unsigned integer type
G::min()        T               Returns the smallest value that G's operator()
                                may return. The value is strictly less than
                                G::max().
G::max()        T               Returns the largest value that G's operator() may
                                return. The value is strictly greater than
                                G::min()
g()             T               Returns a value in the closed interval [G::min(),
                                G::max()]. Has amortized constant complexity.  

Main problem is that MRG32k3a is meant to return a floating point number in (0,1), while a C++ UniformRandomBitGenerator returns an integer type. Why do you want to integrate with the <random> header?

Additional difficulties you would have to take into account:

Alternatives would include using R source code directly without integration with the <random> header or link to libR.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
3

I have found that PRNG's with the same seeds across different languages do not necessarily produce the same result (since they may have parameters that the compiler is free to specify) as seen in the SO posts here and here. That is to say, using the same seed, the same engine, and the same distribution may result in different random numbers depending on the particular implementation of the PRNG.

The first answer explains merely that there is no random number sequence that corresponds universally to a given PRNG seed; it may be documented and implemented differently in different APIs (not just in the compiler and not just at a language level). The second answer is specific to rand and srand in the C language and is the case because rand and srand use an unspecified algorithm.

Although neither answer touches on random number distributions, they too are important if reproducible "randomness" is desired. In that sense, although C++ guarantees the behavior of the engines it provides, it makes the behavior of its distributions (including uniform_real_distribution) implementation-specific.

In general, problems involving seeding PRNGs for repeatable "randomness" could have been avoided if RNG APIs used a stable (unchanging) and documented algorithm not only for the seeded PRNG, but for any random number methods that use that PRNG (which, in the case of R, include runif and rnorm) — in the latter case because the reproducibility of "random" sequences depends on how those methods (not just the PRNG itself) are documented.

Depending on whether you wrote the R code in question, an option may be to write the C++ and R code to use a custom PRNG (as you seem to have done yourself in part) and to use custom implemented algorithms for each random number method the original R code uses (such as runif and rnorm). This option may be viable especially since statistical tests are generally insensitive to details of the specific PRNG in use.

Depending on how the R script is written, another option may be to pregenerate the random numbers needed by the code.

Peter O.
  • 32,158
  • 14
  • 82
  • 96