I was trying to observe a basic openMP based parallelism with the following code,
#include<stdio.h>
#include<omp.h>
#include<stdlib.h>
#include <time.h>
int main(){
long i;
long x[] = {0,0,0,0};
omp_set_num_threads(4);
clock_t time=clock();
#pragma omp parallel for
for(i=0;i<100000000;i++){
x[omp_get_thread_num()]++;
}
double time_taken = (double)(clock() - time) / CLOCKS_PER_SEC;
printf("%ld %ld %ld %ld %lf\n",x[0],x[1],x[2],x[3],time_taken);
}
Now, I am using a quad core i5 processor. I have checked 4 different values of the threads. The following results are found,
Set: omp_set_num_threads(1);
Out: 100000000 0 0 0 0.203921
Set: omp_set_num_threads(2);
Out: 50000000 50000000 0 0 0.826322
Set: omp_set_num_threads(3);
Out: 33333334 33333333 33333333 0 1.448936
Set: omp_set_num_threads(4);
Out: 25000000 25000000 25000000 25000000 1.919655
The x
array values are accurate. But the time is surprisingly increasing in the increased number of threads. I can not get any explanation/justification behind this phenomenon. Is it somehow, omp_get_thread_num()
function that is atomic in nature ? Or something else that I am missing out ?
Compiling as, gcc -o test test.c -fopenmp
UPDATE
So, as per the suggestion in the accepted answer, I have modified the code as follows,
#include<stdio.h>
#include<omp.h>
#include<stdlib.h>
int main(){
long i, t_id, fact=1096;
long x[fact*4];
x[0]=x[fact]=x[2*fact]=x[3*fact]=0;
omp_set_num_threads(4);
double time = omp_get_wtime();
#pragma omp parallel for private(t_id)
for(i=0;i<100000000;i++){
t_id = omp_get_thread_num();
x[t_id*fact]++;
}
double time_taken = omp_get_wtime() - time;
printf("%ld %ld %ld %ld %lf\n",x[0],x[fact],x[2*fact],x[3*fact],time_taken);
}
Now, the results are understandable,
Set: omp_set_num_threads(1)
Out: 100000000 0 0 0 0.250205
Set: omp_set_num_threads(2)
Out: 50000000 50000000 0 0 0.154980
Set: omp_set_num_threads(3)
Out: 33333334 33333333 33333333 0 0.078874
Set: omp_set_num_threads(4)
Out: 25000000 25000000 25000000 25000000 0.061155
Therefore, it was about the cache line size as explained in the accepted answer. Have a look there to get the answer.