0

I am having trouble using replicate() function in R to generate random numbers with Rcpp function. Consider the following function in R:

trial <- function(){rnorm(1)}
replicate(10, trial())

It generate 10 random numbers from Gaussian distribution. It works completely fine and produces result like this:

 [1]  0.7609912 -0.2949613  1.8684363 -0.3358377 -1.6043926  0.2706250  0.5528813  1.0228125 -0.2419092 -1.4761937 

However, I have a c++ function getRan() that generate a random number from Gaussian distribution. I again used replicate to call the function like this:

replicate(10,getRan())

It creates a vector of same number like this:

> replicate(10,getRan())
 [1] -1.374932 -1.374932 -1.374932 -1.374932 -1.374932 -1.374932 -1.374932 -1.374932 -1.374932 -1.374932
> replicate(10,getRan())
 [1] -0.3273785 -0.3273785 -0.3273785 -0.3273785 -0.3273785 -0.3273785 -0.3273785 -0.3273785 -0.3273785 -0.3273785
> replicate(10,getRan())
 [1] -0.7591953 -0.7591953 -0.7591953 -0.7591953 -0.7591953 -0.7591953 -0.7591953 -0.7591953 -0.7591953 -0.7591953
> replicate(10,getRan())
 [1] -1.698935 -1.698935 -1.698935 -1.698935 -1.698935 -1.698935 -1.698935 -1.698935 -1.698935 -1.698935

However, if i call the function multiple times, it works fine:

 getRan()
[1] 1.345227
> getRan()
[1] 0.3555393
> getRan()
[1] 1.587241
> getRan()
[1] 0.5313518

So what is the problem here? Does the replicate() function repeat the same function return from getRan() instead of calling getRan() multiple times? Is it a bug?

PS: I know I can use rnorm(n) to generate n normal random number, however, I want to use the c++ function to do more complex calculations based on generating random numbers

PPS: this is my c++ code:

double getRan(){
  unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
  std::default_random_engine generator(seed);
  std::normal_distribution<double> distribution (0.0,1.0);
  double epi = distribution(generator);
  return epi;
}
John
  • 309
  • 3
  • 12
  • Interesting. Could you check what happens if you do `lapply(1:10, getRan)`? I suspect this is something to do with `replicate` taking an expression, but beyond this I ain't sure... –  May 26 '18 at 14:43
  • 1
    Post a *reproducible example*. This is so far just a (possibly well-intentioned, but still useless) rant ... – Dirk Eddelbuettel May 26 '18 at 14:50
  • 1
    My crystal ball tells me that in your C++ a RNG is instantiated with the same seed. For more we need a [mcve]. – Ralf Stubner May 26 '18 at 16:36
  • @dash2 it produces: > lapply(1:10,getRan) Error in FUN(X[[i]], ...) : unused argument (X[[i]]) – John May 26 '18 at 16:40
  • @RalfStubner, thanks, i just realised the same thing too, i used time as the seed, so when the function is called with replicate(), the seed is the same, im figuring out how to solve it, here is my C++ code: `double getRan(){ unsigned seed = std::chrono::system_clock::now().time_since_epoch().count(); std::default_random_engine generator(seed); std::normal_distribution distribution (0.0,1.0); double epi = distribution(generator); return epi; }` – John May 26 '18 at 16:42
  • It does not make sense to create a new RNG at every call, even if you manage to fix the seeding issues. Why are you not using R‘s RNG as seen in Dirks answer? BTW, time isn‘t the best seed. Better use `std::random_device`. – Ralf Stubner May 26 '18 at 16:56
  • Apologies... I meant `lapply(1:10, function (x) getRan())`. Anyway, better experts than me are now talking to you. –  May 26 '18 at 17:13
  • thanks guys, Rcpp::rnorm works well, but somehow my std::random_device doesnt work, it produces same result every time, wanna know why – John May 29 '18 at 14:59

2 Answers2

2

Here is a counter-example showing that it works just fine:

Code

trialR <- function() { rnorm(1) }
Rcpp::cppFunction("double trialC() { return R::rnorm(0.0, 1.0); }")
Rcpp::cppFunction("Rcpp::NumericVector trialSugar() { return Rcpp::rnorm(1.0, 0.0, 1.0); }")

set.seed(123); replicate(3, trialR())
set.seed(123); replicate(3, trialC())
set.seed(123); replicate(3, trialSugar())

Output

Via Rscript to ensure fresh session etc pp:

edd@rob:/tmp$ Rscript so50543659.R 
[1] -0.560476 -0.230177  1.558708
[1] -0.560476 -0.230177  1.558708
[1] -0.560476 -0.230177  1.558708
edd@rob:/tmp$ 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
1

Dirks answer is correct. You should use R's RNG. If you insist on using a RNG in C++, you can use something like this:

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
#include <random>

namespace {
  std::default_random_engine generator(std::random_device{}());
  std::normal_distribution<double> distribution (0.0,1.0);  
}

// [[Rcpp::export]]
double getRan(){
  return distribution(generator);
}

/*** R
replicate(10,getRan())
*/

This avoids creating a new instance of std::default_random_engine (and std::normal_distribution) at every function call. This is important since the properties of a RNG are only guaranteed for repeated draws from one RNG. Not for repeated draws from different RNGs seeded with (hopefully different) seeds.

BTW, on my system your original code does not produce the same number multiple times. If you are having troubles with std::random_device and are working on windows, you might be affected by this mingw bug. In that case seed by time is the better alternative.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • What is the purpose of the namespace declaration if you don't include a name? – thc May 30 '18 at 16:41
  • @thc It‘s an anonymous namespace. See for example here for more https://stackoverflow.com/questions/5793334/c-static-vs-namespace-vs-singleton – Ralf Stubner May 30 '18 at 17:08