Application in OpenMp slower than serial code c++

Question

I am struggling with OpenMp task

I have to optimize calculation of a few methods - among them is a histogram.

I spent 3 days and tried about 50 different approaches for it, but still my code is slower than the serial execution

Does anybody could enlighten me what I am doing wrong? Do you have any clues how to solve this problem?

Here is a code that needs to be optimized : (par1 usually is less than 10000, the code posted below is put inside some method and during test it is called about 100 times)

unsigned long index;
for (unsigned int par1 = 0; par1 < limit; par1++) {
    for (unsigned int par2 = 0; par2 < par1; par2++) {
        index = getDistance(par1, par2) / param;
        if (index < size) {
            hist[index]++;
        }
    }
}

...

The problem is that the amount of calculation that needs to be done per iteration is changeable - the amount of operations grows sharply when iteration counter grows

I know that the reduction is probably the best way, but I dont know how to apply it for this array

Here is one of my tries, but it is slower than serial execution, (I merged two loops into one):

  signed long max = elements * elements;
  int par1 = 0;
  int par2 = 0;
  unsigned long index;

  #pragma omp parallel for firstprivate(par1, par2, index, max, size) shared(histogram)
  for (unsigned long x = 0; x < max; x++) 
  {
        par1 = x / elements;
        par2 = x % elements;
        if(par1 > par2)
        {
          index = getDistance(par1, par2) / param;
          if (index < size) 
          {
              #pragma omp atomic
              histogram[index]++;   
          }
        }
  }

Possible duplicate of [Calculate the histogram with OpenMP](https://stackoverflow.com/questions/21777819/calculate-the-histogram-with-openmp) — Zulan, Jun 23 '17 at 08:15
Welcome to StackOverflow. Please read [ask]. Unfortunately this is a bad question for various reasons. It would have been really easy to find the duplicate I pointed out. You should have provide a more specific title. If the focus is on a specific performance issue, provide a [mcve], your specific performance observations / exceptions and detailled system specifications. — Zulan, Jun 23 '17 at 08:31
The `histogram[index]++` is serialized, and all threads are going to fight to read and write that location. This limits your gains to the frequency of `index>=size`, which are most likely eaten up by parallelization overhead. Duplicate (linked by @Zulan) has a viable solution. — peterchen, Jun 23 '17 at 13:02

Application in OpenMp slower than serial code c++

0 Answers0