2

I am trying to understand the correct usage of parallel random number generation. After having consulted different resources, I wrote a simple code that seems to work, but it would be nice if someone could confirm my understanding.

For the sake of pointing out the difference and relationship between rand() and rand_r(), let's solve:

Produce a random integer N, then extract N random numbers in parallel and compute their average.

This is my proposal (checking and free omitted), small integers on purpose:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <omp.h>

int main() {
        /* Initialize and extract an integer via rand() */
        srand(time(NULL));
        int N = rand() % 100;

        /* Storage array */ 
        int *extracted = malloc(sizeof(int) * N);

        /* Initialize N seeds for rand_r, which is completely
         * independent on rand and srand().
         * (QUESTION 1: is it right?)
         * Setting the first as time(NULL), and the others
         * via successive increasing is a good idea (? QUESTION 2)*/
        unsigned int *my_seeds = malloc(sizeof(unsigned int) * N);
        my_seeds[0] = time(NULL);
        for (int i = 1; i < N; ++i) {
                my_seeds[i] = my_seeds[i - 1] + 1;
        }

        /* The seeds for rand_r are ready:
         * extract N random numbers in parallel */
        #pragma omp parallel for
        for (int i = 0; i < N; ++i) {
                extracted[i] = rand_r(my_seeds + i) % 10;
        }

        /* Compute the average: must be done sequentially, QUESTION 3,
         * because of time-sincronization in reading/writing avg */
        double avg = 0;
        for (int i = 0; i < N; ++i) {
                avg += extracted[i];
        }
        avg /= N;
        printf("%d samples, %.2f in average.\n", N, avg);
        return 0;
}

As my comments in the code try to highlight, it would be helpful to understand if:

  1. the simultaneous usage of rand and rand_r is in this case correct;

  2. the seed's initialization for rand_r, i.e. the variable my_seeds, is fine;

  3. the for parallelization and related variable usage is safe.

I hope to sum up various doubts in a single, simple, ready-to-use example, after having read various tutorials / sources online (this website included).

duccio
  • 143
  • 6
  • Are you asking if you can use `rand` and `rand_r` together (as long as `rand` is not used concurrently, and `rand_r` with the same seed variable is not used concurrently) ? Or are you asking whether the way you handle the different seeds is correct ? Or something else ? – Sander De Dycker Oct 21 '19 at 08:22
  • I am asking both of them, sorry for the imprecision. I'll edit soon. – duccio Oct 21 '19 at 08:25

2 Answers2

1
  1. the simultaneous usage of rand and rand_r is in this case correct;

As long as :

  • rand is not used concurrently (which in your code is ok - you're only calling it once in the main thread)
  • rand_r with the same seed variable is not used concurrently (which in your code is ok - you're only calling it once for each seed variable)

there are no issues with thread safety.

  1. the seed's initialization for rand_r, i.e. the variable my_seeds, is fine;

You have a separate seed for every (potentially) concurrent use of rand_r. As long as the same seed variable isn't used for concurrent calls to rand_r (which in your code doesn't happen), all is good.

  1. the for parallelization and related variable usage is safe.

Each "thread" in your code has its own seed variable for rand_r and its own result variable. So there's no concurrency issue wrt. that.

Side note : rand_r has been obsoleted, and both rand and rand_r are relatively low quality prng's. Depending on your needs, it might be worth it to investigate alternative prng's.

Sander De Dycker
  • 16,053
  • 1
  • 35
  • 40
  • Thank you very much: that's what I was looking for; Max's answer is, in my opinion, very good, too. I accepted yours because you replied faster and commented since the beginning (the +1 might not be shown because I have not yet enough reputation). – duccio Oct 21 '19 at 09:17
  • Rather than using rand or rand_r, it's worth using a random number generator which is intended for use in parallel, and which can give you non-overlapping guarantees. Look at "Parallel Random Numbers: As Easy as 1, 2, 3" (https://www.thesalmons.org/john/random123/papers/random123sc11.pdf). There are implementations of parallel random number generators in the NAG library, Intel MKL (which is free to everyone) and, no doubt other maths libraries. – Jim Cownie Oct 23 '19 at 08:14
0
  1. There is nothing incorrect about using both, as long as rand is not called concurrently.

  2. It's unclear what you consider as "fine" or "a good idea". It's fine in the sense that you will get different random number sequences produced for each seed. It's a bit nonsensical in that you only generate a single random number from each seed (which means the generated numbers will all likely follow a very predictable pattern, as do your seeds).

  3. There are no race conditions, so it is safe. Parallelization for < 100 calls of a (presumably) simple arithmetic method is not going to be worth it from a performance perspective, but that's not what you're asking about.

All in all, this code has no formal correctness problems. Whether it fulfills whatever purpose you would like it to fulfill is a different question. Take note that rand (and rand_r) tend to be only very superficially random1, so the predictability mentioned in point 2 is just more of the same. See also Why is rand()%6 biased? for yet another quality-of-randomness issue in the code. In other words, be aware that the randomness you are producing here is lacking for many applications.

1Assuming that unsigned int has 32 bits, there are only 32 bits of state for the PRNG, so it will repeat after (at most) 232 calls anyway (which is trivial to brute-force).

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
  • Thanks for the answer: very clear and precise; yes, you surely got the point: this is just an example for learning the technique before moving on more interesting cases. +1 for sure (maybe not visualized because my reputation is not enough), but I accepted Sander's answer because he replied faster and commented since the moment I posted here. – duccio Oct 21 '19 at 09:14
  • @duccio (Actually my answer is the earlier one, but I fully support accepting his answer.) – Max Langhof Oct 21 '19 at 09:16
  • argh, my apologies, I misread! If fine for you, I'll leave my choice unchanged: maybe because he commented since the very beginning, he was meanwhile preparing the answer. Anyway I hope not to cause any issue; if you were here, I'd surely offer a coffee to both of you. Thank you again! – duccio Oct 21 '19 at 09:28
  • @duccio As stated, I'm completely for accepting his answer! – Max Langhof Oct 21 '19 at 09:44
  • "Assuming that unsigned int has 32 bits, there are only 32 bits of state for the PRNG" is unsupported by the C standard. `srand(unsigned int seed)` does limit the seeding to at most `UINT_MAX+1` states, yet the state used by `rand()` may be much more. The seeding of those extra bits could simply be 0. – chux - Reinstate Monica Oct 21 '19 at 23:02
  • @chux I am talking about `rand_r`. Unless you want to argue that `rand_r` has some additional secret thread-safe state and asks you to pass in the state pointer just for fun, that one is indeed limited to 32 bits of state. – Max Langhof Oct 22 '19 at 07:27