-1

I have this algoritmo which scans an image and for each pixel p calculates a 256 bins histogram in which values of the pixel inside a patch around p are saved. The algorithm needs to be O(1) so a need to do many histogram addition, I'd like to make the algorithm faster by parallelizing the histogram addition with OpenMP, so I added #pragma omp parallel for before each for (just the ones with histogram additions) but it actually makes it 10 times slower. I think i need to create a parallel region outside but I don't understand how.

Also, I'm afraid the overhead generated by OpenMP overcomes the speed gained by the parallelization of a 256-for, but I don't know for sure

for (int i = 0; i < src.rows; i++) {
    for (int j = 0; j < src.cols; j++) {
        if (j == 0)
            { ... }
        else {
            if (j > side/2) { // subtract col
                for (int h = 0; h < 256; h++) // THIS ONE
                    histogram[h] -= colHisto[j - (side/2) - 1][h];
            }
            if (j < src.cols - side/2) { // add column
                if (i > side/2) { // subtract pixel
                    colHisto[j + side/2][src.at<uchar>(i - side/2 - 1, j + side/2)]--;
                }
                if (i < src.rows - side/2) { // add pixel
                    colHisto[j + side/2][src.at<uchar>(i + side/2, j + side/2)]++;
                }

                for (int h = 0; h < 256; h++) // AND THIS ONE
                    histogram[h] += colHisto[j + side/2][h];
            }
        }
    }
}
  • 3
    A histogram algorithm can never be O(1) since it has to inspect each element of the array at least once. This makes it O(n) where n is the array size (or number of patches). No amount of parallelization can change the time complexity. – Timo Sep 21 '19 at 16:06
  • please forget about the complexity (it's O(1) respect to the patch's radius) it's really irrelevant here i just need to parallelize those 256 loops, but the issue are those outer loops – Matteo Sacco Sep 21 '19 at 16:20
  • Can you add some more detail? How large is a typical input image? How large are the patches? Are you using OpenCV? What is the memory layout of the images? And most important: did you verify your results with optimizations enabled? – Timo Sep 21 '19 at 16:44
  • I need to obtain this behaviour with OpenMP: ``` #pragma omp parallel private(i,j,me,n) { #pragma omp single { for (int i = 0; i < 10; i++) { for (int j = 0; j < 10; j++) { if (j == 0) #pragma omp for for (int h = 0; h < 256; h++) // THIS ONE histogram[h] += h2[h]; } } } } } ``` – Matteo Sacco Sep 21 '19 at 17:25
  • Have you looked at the other discussions of OpenMP Histograms on SO, e.g. https://stackoverflow.com/questions/21777819/calculate-the-histogram-with-openmp? If that doesn't help, please create a [mcve] (read that page very carefully)! – Zulan Sep 21 '19 at 18:25
  • Well I can't help if you don't provide more details. And just forcing to use openmp won't solve anything. – Timo Sep 21 '19 at 18:34
  • 3
    Possible duplicate of [Calculate the histogram with OpenMP](https://stackoverflow.com/questions/21777819/calculate-the-histogram-with-openmp) – Zulan Sep 22 '19 at 08:45

1 Answers1

0

I actually solved myself by studying OpenMP more here is the code

#pragma omp parallel
{
    for (int i = 0; i < src.rows; i++) {
        for (int j = 0; j < src.cols; j++) {
            // printf("%d%d:", i, j);
            if (j == 0) { ... }
            else {
                #pragma omp single
                { ... }

                one = getTickCount();
                #pragma omp for
                for (int h = 0; h < 256; h++)
                    histogram[h] += colHisto[j + side / 2][h];
                printf("histotime = %d\n", getTickCount() - one);
            }
        }
    }
}

It's significantly faster than putting #pragma omp parallel for before each loop but still slower than the sequential version