I am trying to generate random numbers with the PCG method. I have tested two differents implementation which are given by block 1 and 2 in the following code. The block 1 is correct and scale as expected with the number of thread. The block 2 does not scale in the right way. I do not understand what is wrong with it.
#include <chrono>
#include <iostream>
#include <omp.h>
#include "../include_random/pcg_basic.hpp"
int main()
{
// /*Bloc 1*/
// omp_set_num_threads (threadSNumber);
// startingTime = std::chrono::system_clock::now();
// #pragma omp parallel
// {
// int threadID = omp_get_thread_num();
// pcg32_random_t rng;
// pcg32_srandom_r(&rng, time(NULL) ^ (intptr_t)&printf,(intptr_t)&threadID);
// // uint32_t bound =1;
// #pragma omp for reduction (+:sum)
// for (int step = 0; step < N; step++)
// {
// // sum += 0.5 - (double)pcg32_boundedrand_r(&rng,bound);
// sum += 0.5 -((double)pcg32_random_r(&rng)/(double)UINT32_MAX);
// }
// }
/**Bloc 2**/
omp_set_num_threads (threadSNumber);
pcg32_random_t *rng;
rng = new pcg32_random_t[threadSNumber];
#pragma omp parallel
{
int threadID = omp_get_thread_num();
pcg32_srandom_r(&rng[threadID], time(NULL) ^ (intptr_t)&printf,(intptr_t)&threadID);
}
startingTime = std::chrono::system_clock::now();
#pragma omp parallel
{
int threadID = omp_get_thread_num();
#pragma omp for reduction (+:sum)
for (int step = 0; step < N; step++)
{
sum += 0.5 -((double)pcg32_random_r(&rng[threadID])/(double)UINT32_MAX);
}
}
delete[] rng;
/****/
auto end = std::chrono::system_clock::now();
auto diff = end - startingTime;
double total_time = chrono::duration <double, std::ratio<1>> (diff).count();
cout << "The result of the sum is "<< sum/N << "\n" << endl;
cout << "# Total time: "<< (int)total_time/3600<<"h "<< ((int)total_time%3600)/60<<"m "<< (int)total_time%60 << "s (" << total_time << " s)" << endl;
return 0;
}
The block 1 scale as expected with the thread number, but the block 2 does not.
# thread 1 2 3 4
block1(s) 3.27 1.64 1.12 0.83
block2(s) 4.60 13.7 8.28 10.9
These examples are minimal examples to reproduce the issue. It is a piece of a bigger function that is in a bigger code.
I want to initialize the seed only once, and every time step I compute a bunch of random number which are used in another function (not doing the sum like this, which is only done here to record something). It is possible to use block 1 but it means that I initialize the seed at each time step instead of doing it once. Moreover, I do not understand the scaling of the block2.
What is wrong in the block 2? Why I get this scaling? There are not using the same rng
so I should avoid the data race or I misunderstand something.