Optimizing the use of the C++11 random generator

Question

I'm coding a physics simulation heavily using random numbers, I just profiled my code for the first time so I may be in wrong in reading the output but I see this line coming first:

   %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name  
 90.09     21.88    21.88   265536     0.08     0.08  std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>::operator()()

It seems to mean that generating number generator takes 90% of the time. I had already written a previous post asking if not constructing the random probability distributions at each loop could save me time but after trying and timing it didn't help (Is defining a probability distribution costly? ). Are there common options for optimizing random number generation?

Thank you in advance, my simulations (in its current state) runs for days so reducing on 90% of this computation time would be a significant progress.

it depends on what you are caring for, there are a significant number of different PRNG for different cases, you want something that is linear or not ? something that works on unsigned integers, on floats, on int ... ? — user2485710, Nov 20 '13 at 11:27
have you considered generating the random numbers before running the simulation or is that not possible? — AndersK, Nov 20 '13 at 11:35
@claptrap: Yes I thought about that but unfortunately it doesn't seem possible because I don't know exactly how much random numbers I'll need and the boundaries on my probability densities change during the simulation. — Liam, Nov 20 '13 at 11:41
@user2485710: I need something that works on unsigned integers and floats. What do you mean by linear? My requirements are, so far, something with a really large period and ... fast =) — Liam, Nov 20 '13 at 11:45
@Liam http://en.wikipedia.org/wiki/Linear_congruential_generator if you are doing some serious simulation you probably don't want a _linear_ PRNG — user2485710, Nov 20 '13 at 11:48
@user2485710 Thank you for specifying, indeed I was already unsatisfied with the crand number generator (showing some weird correlation in my output data) I'll probably avoid that in the future ;) — Liam, Nov 20 '13 at 11:50
With threads I indeed meant parallelizing. If your simulations run for days, do they use all CPU cores? Or just one, leaving the other cores CPU cores idle? Because it would be fairly trivial to batches of random numbers ahead of time on one core, while the other is doing the simulations. Some of the scaling might also be done up front, even if the exact distribution isn't fixed yet. E.g. if the distribution is always Gaussian, you can generate a batch of numbers with 0.0 average, 1.0 standard deviation and scale them as needed. — MSalters, Nov 20 '13 at 13:34
@MSalters Yes that could be a solution in a general case, I'm already running my program in parallel, on all the cores if possible. — Liam, Nov 20 '13 at 14:25
you should also look at what is calling RNG - you might be able to reduce the number of calls — Glenn Teitelbaum, Nov 21 '13 at 05:09
@Liam since you don't seem confident about profiling, is there a way you can run a reduced simulation so that it takes eg 3 hours and compare to check if using a faster rng makes the difference? Eg on my computer Jenkins' small prng is 2x faster than MT (but its period is much shorter) — loreb, Nov 21 '13 at 16:25

Walter · Answer 1 · 2013-11-21T10:33:36.683

6

There is always a trade-off between the efficiency, i.e. speed and size (number of bytes of the state), on the one hand and "randomness" of any RNG on the other. The Mersenne twister has quite good randomness (provided you use a high-entropy seed, such as provided by std::random_device), but slow and has large state. std::minstd_rand or std::knuth_b (linear congruential) are faster and ranlux48 (Fibbonacci) yet faster, but are less random (pass fewer test for randomness, i.e. have some non-random spectral properties). Just experiment and test if you're happy with the randomness provided (i.e. have no unsuspected correlations in the random data).

edit: 1 All these RNG are not truly random, of course, and are also not random enough for cryptography. If you need that, use std::random_device, but don't complain about speed. 2 In parallel (which you should consider), use thread_local RNGs, each initialised with another seed.

edited Nov 21 '13 at 10:33

answered Nov 20 '13 at 11:43

Walter

44,150
20
113
196

Thank you for the suggestions, the first random generator I used (crand provided) was showing some weird correlations in my data. I may give your suggestion a try after reading their documentation! – Liam Nov 20 '13 at 11:48
@Liam On the other end of that spectrum is CSPRNGs, which I can all-but guarantee will be *slower*, but you can sleep at night knowing that the correlations you're referring to are not going to be a problem (Dual_EC_DRBG not withstanding; thanks NSA) . Hash-DRBG or HMAC-DRBG coupled on a *strong* cryptographic digest algorithm (SHA256, for example) will definitely give you the scatter you're looking for, but if you think MT is expensive, you haven't seen anything yet. For non-LCG algorithms, however, you may find the MT, with its 2^19937-1 period, to be as good as you're going to likely get. – WhozCraig Nov 20 '13 at 12:27

score 4 · Answer 2 · answered Nov 20 '13 at 12:07

If your code spends most of its time generating random numbers, you may want to take some time to choose the best algorithm for your application and implement it yourself. The Mersenne Twister is a pretty fast algorithm, and has good randomness, but you can always trade off some quality of the random numbers generated for more speed. It will depend on what your simulation requires and on the type of numbers you are generating (ints or floats). If you absolutely need good randomness, Mersenne Twister is probably already one of your best options. Otherwise, you may want to implement a simple linear congruential generator in your code.

Another thing to watch out for is if your code is parallel, you should be using a reentrant version of the random number generator and make sure that different threads use their own internal state variables for their generators. Otherwise, mutexes to avoid overwriting internal state variables of the generator will slow down your code a lot. Many library generators are not reentrant, mind you. If your code is not parallel, you should probably parallelize it and use a separate thread to populate a list of random numbers for your simulation to consume. Another option is to use the GPU to generate random numbers in parallel.

Here are some links comparing the performance of diferent generators: http://www.boost.org/doc/libs/1_38_0/libs/random/random-performance.html https://www.gnu.org/software/gsl/manual/html_node/Random-Number-Generator-Performance.html

"using a reentrant version of the RNG" - since we're talking C++11, that's not a big concern. RNG's are now proper objects, and the usual object rules apply: they're not reentrant, and if threads would need to share one they need a mutex. — MSalters, Nov 20 '13 at 14:59
-1 "*implement it yourself*" is as bad an answer as it can get. — Walter, Nov 21 '13 at 10:29
Implementing something yourself is a perfectly valid strategy when library code is spending 90% of running time. Random number generators aren't that complex. With your own code, you can find and fix bottlenecks, which you can't do in library code. I once doubled the speed of a simulation code by switching away from the standard library and implementing my own heap, map and list structures. Besides, there are [readily available](http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/VERSIONS/C-LANG/c-lang.html) fast implementations that he can download and integrate into his code. — Guilherme Amadio, Nov 21 '13 at 18:03

score 2 · Answer 3 · answered Nov 20 '13 at 11:32

2

Use a dedicated random number library.

I would suggest WELL512 (link contains the paper and source code).

answered Nov 20 '13 at 11:32

Wilbert

7,251
6
51
91

1

What's the use of that? Code will be less portable and it's not clear at all whether it'll be any faster. You still have the basic problem of trade-off between efficiency and randomness. – Walter Nov 20 '13 at 11:45
2

Yes, of course. On the other hand, Well512 is known to be quite a lot more efficient than mersenne twister, while offering similar guarantees. The currently highest-voted answer says 'implement it yourself' - I feel that using a library dedicated for the task at hand is a better solution. – Wilbert Nov 20 '13 at 13:50
Wilbert, agreed about the *implement it yourself*. – Walter Nov 21 '13 at 10:30

score 1 · Answer 4 · answered Nov 20 '13 at 18:58

1

Marsaglia's KISS RNG is fast and is fine for simulation work. I am assuming that you don't need cryptographic quality.

answered Nov 20 '13 at 18:58

rossum

15,344
1
24
38

pedantic comment: the version you link assumes there are exactly 32 bits in an unsigned long – loreb Nov 21 '13 at 16:19
There are a number of versions of KISS in different sizes. That was just the first one I came across. – rossum Nov 21 '13 at 16:23
I know (I love that rng!) -- my point is that this specific version is made for 32 bits, but an unsigned long may be larger – loreb Nov 21 '13 at 16:34

score 1 · Answer 5 · edited May 23 '17 at 12:25

1

If the randomness requirements allow it, you can use the RDTSC instruction to get random numbers, e.g. int from0to9 = rdtsc() % 10.

edited May 23 '17 at 12:25

Community

1
1

answered Nov 21 '13 at 10:38

ArtemGr

11,684
3
52
85

Optimizing the use of the C++11 random generator

5 Answers5