1

I'm trying to learn OpenMP for a program I'm writing. For part of it I'm trying to implement a function to find the average of a large array. Here is my code:

double mean(double* mean_array){
    double mean = 0;

    omp_set_num_threads( 4 );
    #pragma omp parallel for reduction(+:mean)



    for (int i=0; i<aSize; i++){
        mean = mean + mean_array[i];

    }

    printf("hello %d\n", omp_get_thread_num());



    mean = mean/aSize;

    return mean;

}

However if I run the code it runs slower than the sequential version. Also for the print statement I get:

hello 0
hello 0

Which doesn't make much sense to me, shouldn't there be 4 hellos?

Any help would be appreciated.

user2320239
  • 1,021
  • 2
  • 18
  • 43
  • Nowhere in the code you posted would there be any `hello`s, so it's unclear how many there should be. At any rate, what is `aSize`? If it's small, then it is unsurprising that it is slow; there is overhead associated with starting up threads, and unless you have enough data to make the speed-up of using OpenMP appreciable, the overhead will dominate the timing. – R_Kapp Nov 19 '15 at 18:48
  • Hi, sorry I remove the print line by accident, I've updated my code and put it back in. aSize is 2000000 so I think that should be big enough. – user2320239 Nov 19 '15 at 18:51
  • For the line just added in, you should only get one `hello`. It is after the `for` loop, which is the only thing you have parallelized, so it should only be run by thread `0`. It appears, however, that you call your function twice, so it is printed out twice. – R_Kapp Nov 19 '15 at 18:52
  • How are you measuring time? Are you using `omp_get_wtime()`? – R_Kapp Nov 19 '15 at 18:55
  • Thank you, the bit about the hello 0 makes sense to me now. I'm measuring time using clock_t begin, end; double time_spent; begin = clock(); end = clock(); time_spent = (double)(end - begin) / CLOCKS_PER_SEC; – user2320239 Nov 19 '15 at 18:57
  • 1
    See the accepted answer [here](http://stackoverflow.com/questions/10727849/no-performance-gain-after-using-openmp-on-a-program-optimize-for-sequential-runn) to understand why you should use `omp_get_wtime` instead of `clock`. – R_Kapp Nov 19 '15 at 18:58

1 Answers1

2

First, the reason why you are not seeing 4 "hello"s, is because the only part of the program which is executed in parallel is the so called parallel region enclosed within an #pragma omp parallel. In your code that is the loop body (since the omp parallel directive is attached to the for statement), the printf is in the sequential part of the program.

rewriting the code as follows would do the trick:

    double mean = 0;
    #pragma omp parallel num_threads(4)
    {
      #pragma omp for reduction(+:mean)
      for (int i=0; i<aSize; i++) {
         mean += mean_array[i];
      }
      mean /= aSize;
      printf("hello %d\n", omp_get_thread_num());
    }

Second, the fact your program runs slower than the sequential version, it can depend on multiple factors. First of all, you need to make sure the array is large enough so that the overhead of creating those threads (which usually happens when the parallel region is created) is negligible. Also, for small arrays you may be running into "cache false sharing" issues in which threads are competing for the same cache line causing performance degradation.

Anatoly
  • 20,799
  • 3
  • 28
  • 42
simpel01
  • 1,792
  • 12
  • 13