2

I am testing the performance of random number generators in c++ and have come upon some very strange results that I do not understand.

I have tested std::rand vs std::uniform_real_distribution which uses std::minstd_rand.

Code for timing std::rand

auto start = std::chrono::high_resolution_clock::now();

for (int i = 0; i < 1000000; ++i)
    std::rand();

auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() * 1000 << " ms\n";

Code for timing std::uniform_real_distribution with std:minstd_rand

std::minstd_rand Mt(std::chrono::system_clock::now().time_since_epoch().count());
std::uniform_real_distribution<float> Distribution(0, 1);

auto start = std::chrono::high_resolution_clock::now();

for (int i = 0; i < 1000000; ++i)
    Distribution(Mt);

auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() * 1000 << " ms\n";

When compiling with Microsoft Visual Studio 2019, on a Dell Latitude 7390 (I7-8650U 1.9Ghz) I get the following speeds:

std::rand -> Elapsed time: 45.7106 ms std::uniform_real_distribution -> Elapsed time: 65.7437 ms

I have compiler optimizations turned on with the additional command line option of -D__FMA__

However when compiling with g++ on a MacBook Air on MacOS High Sierra (1.4Ghz i5) I get the following speeds:

std::rand -> Elapsed time: 9.4547 ms std::uniform_real_distribution -> Elapsed time: 7.9e-05 ms

using terminal command "g++ prng.cpp -o prng -std=c++17 -O3"

Another problem was that on Mac, testing the speed of uniform_real_distribution the speed would vary if I did / did not print the value.

So

std::minstd_rand Mt(std::chrono::system_clock::now().time_since_epoch().count());
std::uniform_real_distribution<float> Distribution(0, 1);

float num;

auto start = std::chrono::high_resolution_clock::now();

for (int i = 0; i < 1000000; ++i)
    num = Distribution(Mt);

auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() * 1000 << " ms\n";
std::cout << num << '\n';

would give me time of 5.82409 ms

whereas without printing I get 7.9e-05 ms, Note that printing only effects the test for uniform_real_distribution, I do not need to do this for std::rand. I also tested using mersenne instead of which does not suffer from the same issue.

I originally thought that this was compiler optimizations omitting the uniform_real_distribution when it wasn't stored / printed as the variable isn't used and thus can be omitted but then why doesn't the compiler do the same for std::rand, and why do these random functions run faster on Mac than Windows?

EDIT: For clarification mersenne is referring to std::mt19937_64 being used instead of std::minstd_rand for uniform_real_distribution.

name
  • 181
  • 1
  • 12

1 Answers1

5

All of the distributions in the C++ standard library (including uniform_real_distribution) use an implementation-defined algorithm. (The same applies to std::rand, which defers to the C standard's rand function.) Thus, it's natural that there would be performance differences between these distributions in different implementations of the C++ standard library. See also this answer.

You may want to try testing whether there are performance differences in the C++ random engines (such as std::minstd_rand and std::mt19937), which do specify a fixed algorithm in the C++ standard. To do so, generate a random number in the engine directly and not through any C++ distribution such as uniform_int_distribution or uniform_real_distribution.


I originally thought that this was compiler optimizations omitting the uniform_real_distribution when it wasn't stored / printed as the variable isn't used and thus can be omitted but then why doesn't the compiler do the same for std::rand[?]

I presume the compiler could do this optimization because in practice, the C++ standard library is implemented as C++ code that's available to the compiler, so that the compiler could perform certain optimizations on that code as necessary. This is unlike with std::rand, which is only implemented as a function whose implementation is not available to the compiler, limiting the optimizations the compiler could do.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • The test using `uniform_real_distribution` between both the Mac device and Dell device is using std::minstd_rand, I test both that and std::rand. and std::minstd_rand has very different speeds. EDIT: unless i am misunderstanding what you refer to with using minstd_rand as I am still using uniform_real_distribution, but am passing std::minstd_rand as what I assume is the random engine – name Nov 06 '19 at 11:39
  • Generate a random number in the random engine directly and not through any C++ distribution such as `uniform_int_distribution` or `uniform_real_distribution`, and see whether you find any performance difference. – Peter O. Nov 06 '19 at 11:46
  • By using std::minstd_rand directly I get 16ms on Dell and 6ms on Mac with the same test code as above, just calling the random engine directly instead, and with std::mt19937_64 12ms on Dell and 5ms on Mac. Is the difference here just due to hardware if the implementation is the same ? And for std::minstd_rand I still have to store and print the value to get what i assume is the correct timing (I'm guessing this is due to how minstd_rand is implemented). – name Nov 06 '19 at 11:57
  • I'll mark this as the answer for the difference in performance, but do u perhaps know why I require to store and print the returned value from std::minstd_rand, because when I don't I get very small timings of 7.9e-05. – name Nov 06 '19 at 12:05
  • In general, running time figures between hardware platforms are not comparable with each other. – Peter O. Nov 06 '19 at 12:15