1

I have problems parallelizing this code, I think I have to use the critical clause but I don't know how...

#include <stdio.h>
#include <sys/time.h>

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int i, j, k;
int histo[PIXMAX], image[N4][N5];

void calculate_histo(int *array, int matrix[N4][N5]) {

for(i=0; i<PIXMAX; i++) array[i] = 0;

    #pragma omp parallel
    {
        int array_private[PIXMAX];
        for(i=0; i<PIXMAX; i++) array_private[i] = 0;

        #pragma omp for
        for(i=0; i<N4; i++)
            for(j=0; j<N5; j++) {
                array_private[matrix[i][j]]++;
                                }
        #pragma omp critical
        {
            for(i=0; i<PIXMAX; i++) {
                array[i] += array_private[i];
            }
        }
    }
}
main ()
{
omp_set_num_threads(NUM_THREADS);

for(i=0; i<N4; i++)
   for(j=0; j<N5; j++)
   {
     if(i%3) image[i][j] = (i+j) % PIXMAX;
     else    image[i][j] = (i+i*j) % PIXMAX;
   }
calculate_histo(histo,image);

for (k=0; k<PIXMAX; k++) printf("%9d", histo[k]);
}

I get different results each time I run it, the outputs in 5 executions:

1.- 3424378  1765911  2356499  1767451  2354765  2123619  2355686  1767270  2355937  1762464
2.- 3359050  1728213  2310171  1727858  2309947  2094584  2309402  1727705  2310021  1726228
3.- 3479377  1782549  2373773  1783920  2372319  2153420  2374614  1785481  2375290  1781468
4.- 3459613  1781119  2362956  1783067  2362662  2154083  2360726  1781994  2362982  1779394
5.- 3434711  1751408  2349619  1750327  2348681  2104916  2348510  1750427  2350599  1747760

Problems solved, all working fine, thanks for the help! the final code I use is this:

See the comments for more information, like not using global variables or using matrix[i* 5000 + j] instead of matrix[i][j]

#include<stdio.h>
#include<sys/time.h>
#include<omp.h>

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int histo[PIXMAX], image[N4][N5];
int i,j,k;
void calculate_histo(int *array, int matrix[N4][N5]) {

for(i=0; i<PIXMAX; i++) array[i] = 0;

#pragma omp parallel private(i,j)
  {
    int array_private[PIXMAX];

    for(i=0; i<PIXMAX; i++)
      array_private[i] = 0;

#pragma omp for
    for(i=0; i<N4; i++)
      for( j=0; j<N5; j++) {
    array_private[matrix[i][j]]++;
      }

#pragma omp critical
    {
      for( i=0; i<PIXMAX; i++) {
    array[i] += array_private[i];
      }
    }
  }
}

int main () {

  omp_set_num_threads(NUM_THREADS);
  for( i=0; i<N4; i++)
    for( j=0; j<N5; j++) {
      if(i%3) 
        image[i][j] = (i+j) % PIXMAX;
      else
        image[i][j] = (i+i*j) % PIXMAX;
    }

  for ( k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");

  calculate_histo(histo,image);

  for ( k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");
  return 0;
}
  • I tried with #pragma omp parallel for private(i,j) in the outer loop, and #pragma omp parallel for private(j) in the inner loop. If I write #pragma omp critical inside the inner loop the programm doesn't end –  Jun 07 '13 at 18:20
  • 2
    That code you posted is much clearer. You should add which compiler you're using and what OS. Also, which compile options. I compiled with "g++ foo.cpp -fopenmp -O3 -Wall". I had to add "#include ". I get the same result every time "3919000 1999500 2666500 1999500 2666500 2417000 2666500 1999500 2666500 1999500". –  Jun 09 '13 at 21:36
  • that output is the correct! I use mac OSX, gcc -fopenmp -o foo foo.c –  Jun 09 '13 at 21:39
  • Hmmm...I don't know why it's not working for you. Personally, I never (almost never) use global variables. If you look at my code notice that I defined them in main as "int array[10]; int *matrix = new int[5000*5000];" The 5000x5000 matrix is large (95Mb) so I think you should allocated it dynamically (on the heap). In C use malloc. It's a bit of pain to do this with 2D arrays which is one of the reasons I do matrix[i*5000+j] instead of matrix[i][j]. –  Jun 10 '13 at 07:06
  • Also, @Massimiliano is right about the race condition on i. I always declare them in the loop decleration so it's not a problem. This is simple to fix. Just define int i in the parallel block for example just before you define array_private. That will make it private. –  Jun 10 '13 at 07:15
  • I ran your code in Visual Studio and I got different output everytime. I removed your global i,j defintion and defined them before I used them. I put "int i,j" just before "int array_private[PIXMAX];" and then I defined i and j outside of the parallel block when I needed them. Now Visual Studio gives the same result everytime. –  Jun 10 '13 at 07:22
  • Okay, I updated my answer with the modification I used to get your code working in Visual Studio. I hope that works for you. –  Jun 10 '13 at 07:25

2 Answers2

1

You could use atomic to do this but it won't be efficient. A better way is to use a private array for each thread, fill them in parallel, and then fill the shared array in a critical section. See the code below. It's also possible to do this without a critical section but it's a bit more complicated Fill histograms (array reduction) in parallel with OpenMP without using a critical section

Here is the function I recommend (I use matrix[i*5000 + j] instead of matrix[i][j] because Fortran and C do the indexing opposite of each other and I can never remember which is which).

void foo_omp_v2(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    #pragma omp parallel
    {
        int array_private[10];
        for(int i=0; i<10; i++) array_private[i] = 0;

        #pragma omp for
        for(int i=0; i<5000; i++) {
            for(int j=0; j<5000; j++) {
                array_private[matrix[i*5000 + j]]++;
            }
        }
        #pragma omp critical 
        {
            for(int i=0; i<10; i++) {
                array[i] += array_private[i];
            }
        }
    }
}

Here is the full code I used showing atomic being worse

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

void foo(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    for(int i=0; i<5000; i++) {
        for(int j=0; j<5000; j++) {
          array[matrix[i*5000 + j]]++;
        }
    }

    for(int i=0; i<10; i++) {
        printf("%d ", array[i]);
    } printf("\n");
}

void foo_omp_v1(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    #pragma omp parallel for
    for(int i=0; i<5000; i++) {
        for(int j=0; j<5000; j++) {
            #pragma omp atomic
            array[matrix[i*5000 + j]]++;
        }
    }

    for(int i=0; i<10; i++) {
        printf("%d ", array[i]);
    } printf("\n");
}

void foo_omp_v2(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    #pragma omp parallel
    {
        int array_private[10];
        for(int i=0; i<10; i++) array_private[i] = 0;

        #pragma omp for
        for(int i=0; i<5000; i++) {
            for(int j=0; j<5000; j++) {
                array_private[matrix[i*5000 + j]]++;
            }
        }
        #pragma omp critical 
        {
            for(int i=0; i<10; i++) {
                array[i] += array_private[i];
            }
        }
    }

    for(int i=0; i<10; i++) {
        printf("%d ", array[i]);
    } printf("\n");
}

int main() {
    int array[10];
    int *matrix = new int[5000*5000];
    for(int i=0; i<(5000*5000); i++) {
        matrix[i]=rand()%10;
    }

    double dtime;

    dtime = omp_get_wtime();
    foo(array, matrix);
    dtime = omp_get_wtime() - dtime;
    printf("time %f\n", dtime);

    dtime = omp_get_wtime();
    foo_omp_v1(array, matrix);
    dtime = omp_get_wtime() - dtime;
    printf("time %f\n", dtime);

    dtime = omp_get_wtime();
    foo_omp_v2(array, matrix);
    dtime = omp_get_wtime() - dtime;
    printf("time %f\n", dtime);

}

Here is the version of your code that works for me in GCC and Visual Studio

#include <stdio.h>
#include <omp.h>
//#include <sys/time.h>

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int histo[PIXMAX], image[N4][N5];
void calculate_histo(int *array, int matrix[N4][N5]) {

    int i;
    for(i=0; i<PIXMAX; i++) array[i] = 0;

    #pragma omp parallel
    {
        int i,j;
        int array_private[PIXMAX];
        for(i=0; i<PIXMAX; i++) array_private[i] = 0;

        #pragma omp for
        for(i=0; i<N4; i++)
            for(j=0; j<N5; j++) {
                array_private[matrix[i][j]]++;
                                }
        #pragma omp critical
        {
            for(i=0; i<PIXMAX; i++) {
                array[i] += array_private[i];
            }
        }
    }
}
int main () {
    omp_set_num_threads(NUM_THREADS);

    int i,j;
    for(i=0; i<N4; i++)
       for(j=0; j<N5; j++)
       {
         if(i%3) image[i][j] = (i+j) % PIXMAX;
         else    image[i][j] = (i+i*j) % PIXMAX;
       }
    calculate_histo(histo,image);

    for (i=0; i<PIXMAX; i++) 
        printf("%9d", histo[i]);
        printf("\n");
}
Community
  • 1
  • 1
  • thanks for helping raxman! I have different results each time I run it :S this is my code http://pastebin.com/S5Y7tc4X –  Jun 09 '13 at 15:01
  • That's quite short code and actually it's quite clear. Why don't you update your question with that code (in case others want to look at it)? I compiled it and ran it a few times with a different number of threads and I get the same result each time? What do you mean you're getting different results every time? –  Jun 09 '13 at 20:29
1

Your program has two major issues:

  1. there is a race condition on the loop variable i and j
  2. omp_set_num_threads is implicitly declared

Here is a fixed copy of your source with the corrections highlighted:

#include<stdio.h>
#include<sys/time.h>
#include<omp.h> // Problem # 2

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int histo[PIXMAX], image[N4][N5];

void calculate_histo(int *array, int matrix[N4][N5]) {

for(int i=0; i<PIXMAX; i++) array[i] = 0;

#pragma omp parallel
  {
    int array_private[PIXMAX];

    for(int i=0; i<PIXMAX; i++) // # Problem # 1
      array_private[i] = 0;

#pragma omp for
    for(int i=0; i<N4; i++)
      for(int j=0; j<N5; j++) { // # Problem # 1
    array_private[matrix[i][j]]++;
      }

#pragma omp critical
    {
      for(int i=0; i<PIXMAX; i++) {
    array[i] += array_private[i];
      }
    }
  }
}

int main () {

  omp_set_num_threads(NUM_THREADS);
  for(int i=0; i<N4; i++)
    for(int j=0; j<N5; j++) {
      if(i%3) 
        image[i][j] = (i+j) % PIXMAX;
      else
        image[i][j] = (i+i*j) % PIXMAX;
    }

  for (int k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");

  calculate_histo(histo,image);

  for (int k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");
  return 0;
}

For the first point I would suggest using C99 standard, which permits to declare variables within the body of a function (enhancing thus the locality of their use).

Regarding the implicit declaration: if you don't declare a function in C, then its prototype is assumed to return an int and take an undefined number of parameters. The function omp_set_num_threads is therefore implicitly declared as:

int omp_set_num_threads()

instead of:

 void omp_set_num_threads(int );

As not declaring a function is not strictly an error, compilers usually don't raise the issue if they are not explicitly told to do so. Therefore if you compile with:

gcc foo.c -fopenmp -o foo

this will go unnoticed. To avoid this kind of pitfalls it is thus generally recommended to use the maximum warning level available from the compiler:

gcc foo.c -fopenmp -Wall -Werror -pedantic -o foo
Massimiliano
  • 7,842
  • 2
  • 47
  • 62
  • good! If I compile it in cpp and using C99 standard all goes fine, finally I compile it in c using private(i,j) after the #pragma omp parallel (I couldn't use C99 in c, a lot of errors) –  Jun 10 '13 at 06:56
  • if I use for(int i=0...) in C I get errors, I use the declaration int i,j,k; and then use the private clause –  Jun 10 '13 at 07:05
  • 1
    For the program logic your solution is completely fine. The only thing is that you have global variables which are used as local indexes (which is not considered to be a good programming practice). To activate C99 you must use the flag `-std=c99`. The full compilation command is `gcc foo.c -fopenmp -std=c99 -Wall -Werror -pedantic -o foo` – Massimiliano Jun 10 '13 at 07:07
  • 1
    Good call on the race condition on i. The code did not work for me in Visual Studio until I made j private as well. –  Jun 10 '13 at 07:44
  • @HristoIliev, claims it's unnecessary to make i and j private but this code only works if you do so I don't understand. –  Jun 21 '13 at 10:23
  • @Massimiliano. I misunderstood what he meant. You actually, explained it in your answer that `i` and `j` are used in in non-parallel loops and therefore need to be made private. –  Jun 21 '13 at 11:59