What method should be faster? First method is increment one variable for reduction:
#pragma omp parallel private(seed, x, y, i) reduction (+:counter)
{
seed = 25234 + 17 * omp_get_thread_num();
nproc = omp_get_thread_num();
#pragma omp parallel for
for(i=0; i<prec/8; i++){
x = (double)rand_r(&seed) / RAND_MAX;
y = (double)rand_r(&seed) / RAND_MAX;
if(x*x+y*y<1){
counter++;
}
}
And the second one is using table of increment variables per process and at the end, sum of elements in this table is a result:
#pragma omp parallel private(seed, x, y, i , nproc)
{
seed = 25234 + 17 * omp_get_thread_num();
nproc = omp_get_thread_num();
#pragma omp parallel for
for(i=0; i<prec/8; i++){
x = (double)rand_r(&seed) / RAND_MAX;
y = (double)rand_r(&seed) / RAND_MAX;
if(x*x+y*y<1){
counter[nproc]++;
}
}
}
double time = omp_get_wtime() - start_time;
int sum=0;
for(int i=0; i<8; i++){
sum+=counter[i];
}
Theoretically, the second way should be faster, because processes are not sharing one variable, but every process has own variable. But when I calculate time of execution:
first approach: 3.72423 [s]
second approach: 8.94479[s]
Am I think wrong or am I do something wrong in my code?