5

I have a bunch of threads, each one needs a thread safe random number. Since in my real program threads are spawned and joined repeatedly, I wouldn't like to create random_device and mt19937 each time I enter a new parallel region which calls the same function, so I put them as static:

#include <iostream>
#include <random>
#include <omp.h>

void test(void) {
    static std::random_device rd;
    static std::mt19937 rng(rd());
    static std::uniform_int_distribution<int> uni(1, 1000);

    int x = uni(rng);
#   pragma omp critical
    std::cout << "thread " << omp_get_thread_num() << " | x = " << x << std::endl;
}

int main() {
#   pragma omp parallel num_threads(4)
    test();
}

I cannot place them as threadprivate because of Error C3057: dynamic initialization of 'threadprivate' symbols is not currently supported. Some sources say random_device and mt19937 are thread safe, but I haven't managed to find any docs which would prove it.

  1. Is this randomization thread safe?
  2. If no, which of the static objects can be left as static to preserve thread safety?
Victor Eijkhout
  • 5,088
  • 2
  • 22
  • 23
Kaiyakha
  • 1,463
  • 1
  • 6
  • 19
  • First guess: No. Because threads can be interrupted during the non-atomic operation of `uni(rng)` and interfere with random number generation. `rd`, `rng` and `uni` are objects and their states are shared across multiple threads without synchronization. – knivil May 03 '22 at 11:20
  • You can certainly make them `static thread_local` if that helps, not sure about openmp compatibility. If `rd` is used only for initialization, consider `rng(std::random_device{}()))`; – Quimby May 03 '22 at 11:22
  • @Quimby According to [this](https://stackoverflow.com/questions/22794382/are-c11-thread-local-variables-automatically-static), `static thread_local` does not make an instance live longer than its thread, here `static` is about linkage, not storage duration. – Kaiyakha May 03 '22 at 11:42
  • Your static initialization is safe, it is `int x = uni(rng)` that is not going to be threadsafe. You'll need a mutex or something like that to stop multiple threads from calling `uni(rng)` at the same time. – NathanOliver May 03 '22 at 11:56
  • @NathanOliver mutex makes code sequential and may significantly affect performance, making generators non-static would be even less expensive than mutex – Kaiyakha May 03 '22 at 12:01
  • That's what a profiler is for. Try both and see which is actually faster. The rule of multithreading is if you have shared state, and one or more writer, then you need some sort of synchronization around that shared state. So you either get rid of the shared state or you provide the synchronization, there really aren't any other options. – NathanOliver May 03 '22 at 12:04
  • @NathanOliver I'm not sure how `mt19937` and `uni` work, do they have some writers under the hood? If they do, I cannot have them as static – Kaiyakha May 03 '22 at 12:08
  • 4
    Yes, both `mt19937` and `uni` have state inside them that they modify to create the random numbers. Because they modify themselves, they need to be protected if you are sharing them between threads. – NathanOliver May 03 '22 at 12:10
  • @Kaiyakha yes, of course it doesn't. `thread_local` would make one generator for each thread. Perhaps I misunderstood the question then. – Quimby May 03 '22 at 12:21
  • 2
    *Some sources say random_device and mt19937 are thread safe, but I haven't managed to find any docs which would prove it.* -- Those "docs" are the C++11 standard. Local `static` variables are guaranteed to be accessed in a thread-safe manner, otherwise [Meyers singletons](https://stackoverflow.com/questions/17712001/how-is-meyers-implementation-of-a-singleton-actually-a-singleton) would not work. It is the `uni(rng)` that is the issue. – PaulMcKenzie May 03 '22 at 12:41
  • @NathanOliver "either get rid of the shared state" That much is clear. But the question is how? How can a thread private (in the informal sense) generator persist across parallel regions? Or is recreating the generators each time (and then worrying about statistical significance) the only way out? – Victor Eijkhout May 03 '22 at 13:40
  • Does this answer your question? [How to generate random numbers in parallel?](https://stackoverflow.com/questions/4287531/how-to-generate-random-numbers-in-parallel) – paleonix May 03 '22 at 13:59
  • 2
    @paleonix That's not the right language. (Even if a solution in that language would work for this language). – cigien May 03 '22 at 14:02
  • @Quimby `thread_local` is not part of OpenMP, which is the subject of this question. C++ threads and OpenMP threads are completely different things. I'm removing the "multithreading" tag from this question. – Victor Eijkhout May 03 '22 at 14:15
  • @cigien I agree that it isn't a duplicate, but it still demonstrates how to do it in theory, although it looks a bit different in practice with the C++ generators having state. I should have just commented it as a FYI. So what I wanted to say is that I would also rather use per-thread generators. A solution I have seen before is to use a shared array of generators. One just needs to be very careful to avoid false sharing / cache thrashing. – paleonix May 03 '22 at 15:01
  • @Quimby btw why do we need `{}` in `rng(std::random_device{}()))` – Kaiyakha May 05 '22 at 09:33
  • @Kaiyakha Because `{}` creates a temporary object and `()` calls its call operator. – Quimby May 05 '22 at 10:39

2 Answers2

1

Here is a different approach. I keep a global seeding value so that the random_device is only used once. Since using it can be very slow, I think it is prudent to only use it as rarely as possible.

Instead we increment the seeding value per thread and also per use. That way we avoid the birthday paradox and we minimize the thread-local state to a single integer.

#include <omp.h>

#include <algorithm>
#include <array>
#include <random>


using seed_type = std::array<std::mt19937::result_type, std::mt19937::state_size>;


namespace {

  seed_type init_seed()
  {
    seed_type rtrn;
    std::random_device rdev;
    std::generate(rtrn.begin(), rtrn.end(), std::ref(rdev));
    return rtrn;
  }
  
}
/**
 * Provides a process-global random seeding value
 *
 * Thread-safe (assuming the C++ compiler if standard-conforming.
 * Seed is initialized on first call
 */
seed_type global_seed()
{
  static seed_type rtrn = init_seed();
  return rtrn;
}
/**
 * Creates a new random number generator
 *
 * Operation is thread-safe, Each thread will get its own RNG with a different
 * seed. Repeated calls within a thread will create different RNGs, too.
 */
std::mt19937 make_rng()
{
  static std::mt19937::result_type sequence_number = 0;
# pragma omp threadprivate(sequence_number)
  seed_type seed = global_seed();
  static_assert(seed.size() >= 3);
  seed[0] += sequence_number++;
  seed[1] += static_cast<std::mt19937::result_type>(omp_get_thread_num());
  seed[2] += static_cast<std::mt19937::result_type>(omp_get_level());
  std::seed_seq sseq(seed.begin(), seed.end());
  return std::mt19937(sseq);
}

See also this: How to make this code thread safe with openMP? Monte Carlo two-dimensional integration

For the approach of just increment the seeding value, see this: https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/

Homer512
  • 9,144
  • 2
  • 8
  • 25
  • 1
    If the quality of the random numbers is important I would advise reading [Melissa O'Neills PCG blog](https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html). – paleonix May 03 '22 at 15:18
  • You might usefully also consider a random number generator which is intended to be used in parallel. See, for instance, [Parallel Random Numbers: As Easy as 1, 2, 3](http://www.thesalmons.org/john/random123/papers/random123sc11.pdf) which is now implemented in many maths libraries. – Jim Cownie May 05 '22 at 07:41
0

I think threadprivate is the right approach still, and you can obviate the initialization problem by doing a parallel assignment later.

static random_device rd;
static mt19937 rng;
#pragma omp threadprivate(rd)
#pragma omp threadprivate(rng)

int main() {

#pragma omp parallel
  rng = mt19937(rd());

#pragma omp parallel
  {
    stringstream res;
    uniform_int_distribution<int> uni(1, 100);
    res << "Thread " << omp_get_thread_num() << ": " << uni(rng) << "\n";
    cout << res.str();
  }

  return 0;
}

Btw, note the stringstream: OpenMP has a tendency to split output lines at the << operators.

Victor Eijkhout
  • 5,088
  • 2
  • 22
  • 23