0

Can someone explain to me why second for loop, with old rand(), executes a lot faster than the first loop? I have seen through multiple post here that using c++11 engine and uniform distribution are being recommended bacause rand() + multithreading is a perf bottleneck? Also as suggested I have generator and function per thread.

int main(int argc, char *argv[])
{

    std::vector<mt19937> generators(omp_get_max_threads());
    std::vector<uniform_real_distribution<double>> functions(omp_get_max_threads());
    
    for (int i = 0; i < omp_get_max_threads(); i++)
    {
        // seed is random
        generators[i] =  mt19937(1654 + 17*i);
        functions[i] =   uniform_real_distribution<double>(0.0,1.0);
    }

    double itime = omp_get_wtime();
    #pragma omp parallel for        
        for (int i =0; i < 10000000; i++)
        {
            int a = functions[omp_get_thread_num()](generators[omp_get_thread_num()]);
        }
        
    
    double end = omp_get_wtime() - itime;
    
    srand(time(NULL));
    double start = omp_get_wtime();
        #pragma omp parallel for
        for (int i =0; i < 10000000; i++)
        {
            int a = ((double) rand() / (RAND_MAX));
        }
    double endR  = omp_get_wtime() - start;

    cout << "Generator: " << end<<endl;
    cout<<"Old: "<<endR << endl;
}
François Andrieux
  • 28,148
  • 6
  • 56
  • 87
  • Also, to add. This code is built with gcc and executed on i9 processor – Andrija Veljkovic Sep 14 '20 at 16:40
  • 8
    Do note that `rand` can be a very simple and very not random PRNG. `mt19937` OTOH is a much more complicated and performance impacting algorithm and provides much more random number. – NathanOliver Sep 14 '20 at 16:44
  • 3
    Simply put, `rand` just isn't very good. Agree with @NathanOliver. – Mark Ransom Sep 14 '20 at 16:45
  • 1
    The execution time is different because the algorithm is different. Related : [Random Engine Differences](https://stackoverflow.com/q/16536617/327083), [What is the difference between a non-secure random number generator and a secure random number generator?](https://stackoverflow.com/q/101337/327083) – J... Sep 14 '20 at 16:48
  • [Funny and interesting piece on just how bad `rand` is](https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful) along with alternatives.. – user4581301 Sep 14 '20 at 17:04
  • 1
    also note that there are better PRNGs than the MT. better in terms of both speed (e.g. cycles per byte, cache pressure, etc) and in terms of statistical properties (linear complexity being the most obvious). a more recent paper on the MT is https://arxiv.org/abs/1910.06437 – Sam Mason Sep 14 '20 at 22:28
  • 1
    Two million timed calls to omp_get_thead_num() aren't going to help performance, plus the results from rand() when it is executed in parallel are meaningless. You need to put the call to rand() inside a critical section for a fair comparison. (And on parallel random numbers looks at Parallel Random Numbers: As Easy as 1, 2, 3 https://thesalmons.org/john/random123/papers/random123sc11.pdf) – Jim Cownie Sep 15 '20 at 07:28
  • 1
    Fully agree with @JimCownie Also note that even though you have separate `mt19937` instances for the threads, you most likely suffer from _false sharing_ since multiple instances are on the same cacheline. – mpoeter Sep 15 '20 at 08:12

1 Answers1

0

The comments suggest something like this

struct alignas(std::hardware_destructive_interference_size) Rng {
   std::mt19937 mers;
  std::uniform_real_distribution<double>> dist;
};

std::vector<Rng> rngs(omp_get_max_threads());

And I vaguely remember that omp has a way to set a value for each chunk so the omp_get_thread_num() all the time, and that might also make it possible to make an ref to each struct Rng.

Surt
  • 15,501
  • 3
  • 23
  • 39