Fast Parallel Random Number Generation from /dev/{random,urandom}

Question

I have scientific research code that looks like this:

    #define TRIALS 1000000
    #define LEN 10
    int i;
    for(i=0;i<TRIALS;i++) {
        uint8_t r[LEN];
        getRand(r, LEN);
        doExperiment(r);
    }

where I am getting random numbers using /dev/urandom:

    void getRand(uint8_t *r, int len) {
        int rand = open("/dev/urandom", O_RDONLY);
        read(rand, r, len);
        close(rand);
    }

Note: I do not require my experiment to be repeatable so do not care about having a fixed seed. However, it is mission critical that my random numbers are high quality (reasonably close to being cryptographically secure) so that the statistics of my results are valid. Speed is also very important.

I plan to parallelise this code, firstly using OpenMP by just sticking a #pragma omp parallel for in front of my loop.

Question: What is the best way to generate random numbers concurrently (feel free to suggest not using /dev/urandom)? Should I put a mutex around calls to getRand() and allow my code to serialise on getting random numbers, should I attempt to generate all the random numbers I require up front beforehand, or should I have a separate thread which fills a buffer of random numbers which is read from (with a mutex lock) in a producer-consumer fashion? Is the best solution different if I were to use /dev/random instead, which is a finite resource and might block?

I have read through the relating posts on generating random numbers in parallel, but wish to address a question specifically in reference to using /dev/{urandom,random}.

`uint8_t r[LEN]; r = getRand(r, LEN);` r is not an assignable lvalue, this should not even compile. BTW `void getRand()` returns void, so there will no rvalue either ... — wildplasser, May 11 '14 at 11:08
Please limit yourself to one question per question, especially when at least one of your questions isn't really related to the other two. And make you code examples sensible. Your code example calling the `getRand` function is complete nonsense as written — talonmies, May 11 '14 at 11:08
Apologies for the silly mistake in my code example, hopefully it is more sensible now! I have removed the side-question relating to CUDA. — Dave White, May 11 '14 at 11:13
Possible duplicate of http://stackoverflow.com/q/14923902/681865 — talonmies, May 11 '14 at 11:13
The function `void getRand()` performs *three* systemcalls. If you want to "speed it up", you should try to reduce the number of systemcalls. — wildplasser, May 11 '14 at 11:15
AES-CTR is fast as hell on CPUs with AES-NI instructions. On Haswell it finally broke the one cycle-per-byte barrier. — CodesInChaos, May 11 '14 at 12:06

score 0 · Accepted Answer · edited May 23 '17 at 12:31

To consolidate a few comments...

Making multiple calls to a getRand() function which reads from /dev/urandom is slow and should be avoided, as it uses system calls which add a lot of overhead. It is better to read much larger chunks from /dev/urandom and buffer them, or to use /dev/urandom to seed a software PRNG.

In the latter case, OpenSSL's RAND_bytes() can be used which returns "cryptographically strong random" values. This can be configured to use Intel's DRNG via the RDRAND instruction (see http://wiki.openssl.org/index.php/Random_Numbers#Hardware) which is discussed here. This actually uses a hardware implementation of AES in counter mode via the AES-NI instruction set (which can also be accessed directly through OpenSSL's EVP API). According to Intel the RDRAND-enabled version of OpenSSL outperformed the non-RDRAND version by an order of magnitude.

Two approaches (discussed in this post) for generating random numbers for multiple threads are either to seed a seperate PRNG for each thread from /dev/urandom, or to seed one PRNG from /dev/urandom and then seed each thread's PRNG from that one.

It should be noted, though, that OpenSSL is not thread safe. This post gives a good example of using OpenSSL with OpenMP.

On Ivy Bridge or later processors you can use [_rdrand16_step(), _rdrand32_step() and _rdrand64_step()](http://stackoverflow.com/questions/20970643/generating-random-numbers-cpu-vs-gpu-which-currently-wins/20970894#20970894) in parallel with OpenMP. — Z boson, May 12 '14 at 09:52

Fast Parallel Random Number Generation from /dev/{random,urandom}

1 Answers1