I am trying to learn multi-threaded programming using openmp.
To begin with, I was testing out a nested loop with a large number of array access operations, and then parallelizing it. I am attaching the code below. Basically, I have this fairly large array tmp in the interior loop, and if I make it shared so that every thread can access and change it, my code actually slows down with increasing number of threads. I have written it so that every thread writes the exact same values to array tmp. When I make tmp private, I get speed up proportional to the number of threads. The no. of operations seem to me to be exactly the same in both cases. Why is it slowing down when tmp is shared ? Is it because different threads try to access the same address at the same time ?
int main(){
int k,m,n,dummy_cntr=5000,nthread=10,id;
long num=10000000;
double x[num],tmp[dummy_cntr];
double tm,fact;
clock_t st,fn;
st=clock();
omp_set_num_threads(nthread);
#pragma omp parallel private(tmp)
{
id = omp_get_thread_num();
printf("Thread no. %d \n",id);
#pragma omp for
for (k=0; k<num; k++){
x[k]=k+1;
for (m=0; m<dummy_cntr; m++){
tmp[m] = m;
}
}
}
fn=clock();
tm=(fn-st)/CLOCKS_PER_SEC;
}
P.S.: I am aware that using clock() here doesn't really give the correct time. I have to divide it by the no. of threads in this case to get a similar output as given by "time ./a.out".