5

I am trying to find an efficient way to implement a uniform(0,1) distribution. Since I have to generate a very large number of samples, I chose mt19937 as engine. I am using the version from the boost library. My question is: what is the difference between using the output of the engine itself vs using uniform_real_distribution?

Option #1

std::random_device rd;
boost::mt19937 gen(rd());
boost::random::uniform_real_distribution<double> urand(0, 1);

for ( int i = 0; i < 1E8; i++ ) {
    u = urand(gen);
}

Option #2

std::random_device rd;
boost::mt19937 gen(rd());

for ( int i = 0; i < 1E8; i++ ) {
    u = (double) gen()/gen.max();
}

From my tests, Option #2 is considerably better than Option #1 in terms of runtime. Is there any reason I should pick Option #1 over Option #2?

user3278488
  • 95
  • 1
  • 6
  • 1
    Without looking at the implementation itself it's impossible to know for sure, but I'd assume that `uniform_real_distribution` uses more bits to ensure that every possible floating point result in the range can be returned. Option #2 will have holes that are `1/gen.max()` apart. – Mark Ransom Dec 10 '14 at 18:34

2 Answers2

1

I don't know the underlying implementation of urand(), but using the result of a division is likely to produce bias in the low-order bits as a quantisation effect. If gen.max() isn't large then "low-order bits" may be very many or most of the bits of the result.

The performance disparity may come from producing properly distributed random numbers. If double is overly precise for your needs then perhaps using float might allow it to run more efficiently.

sh1
  • 4,324
  • 17
  • 30
0

My question is: what is the difference between using the output of the engine itself vs using uniform_real_distribution?

In your first option urand() has the range [0,1), whereas your second option has the range [0,1] (if boost::mt19937::min() == 0, which usually holds).

Zeta
  • 103,620
  • 13
  • 194
  • 236
  • Thanks! I didn't notice the difference in range. It seems then that the second option is better both in terms of performance and quality of the random number generator? – user3278488 Dec 09 '14 at 15:53