0

I wrote two programs prog1.c and prog2.c in a VMware virtual machine including 2 cpu cores. Prog1.c has the openmp directive # pragma OMP SIMD, and prog2.c does not have it.

prog1.c

 void bgra2rgb(const char *src,char*dst,int w,int h)
    {
    #pragma omp simd
        for(int y=0;y<h;++y)
        {
            for(int x=0;x<w;++x)
            {
                dst[(y*w+x)*3  ] = src[(y*w+x)*4 + 2];
                dst[(y*w+x)*3+1] = src[(y*w+x)*4 + 1];
                dst[(y*w+x)*3+2] = src[(y*w+x)*4 + 0];
            }
        }
    }
    
    
    int main()
    {
        char bgra_mat[480*640*4];
        char rgb_mat[480*640*3];
        for(int i = 0 ; i < 1000; i++)
            bgra2rgb(bgra_mat,rgb_mat,480,640);
    
    }

prog2.c

void bgra2rgb(const char *src,char*dst,int w,int h)
{

    for(int y=0;y<h;++y)
    {
        for(int x=0;x<w;++x)
        {
            dst[(y*w+x)*3  ] = src[(y*w+x)*4 + 2];
            dst[(y*w+x)*3+1] = src[(y*w+x)*4 + 1];
            dst[(y*w+x)*3+2] = src[(y*w+x)*4 + 0];
        }
    }
}


int main()
{
    char bgra_mat[480*640*4];
    char rgb_mat[480*640*3];
    for(int i = 0 ; i < 1000; i++)
        bgra2rgb(bgra_mat,rgb_mat,480,640);
}

The prog1.c is compiled by "gcc -fopt-info -mavx -fopenmp -o prog1 prog1.c" . The prog2.c is compiled by "gcc -fopenmp -o prog2 prog2.c" But the execution time of prog1 and prog2 are same, why ? What's wrong with me ?

lili tan
  • 11
  • 2
  • 2
    You forgot to enable any optimization, the default is `-O0` (debug mode), so of course GCC decides to ignore the hint that it could vectorize!! `gcc -O2 -march=native` is your best bet for seeing a speedup; `-O3` enables auto-vectorization even without OpenMP. (Although IDK whether the normal or OpenMP vectorizer will recognize that byte shuffle: you might need to manually vectorize with intrinsics, especially since the repeating pattern length (3) isn't a multiple of 2) – Peter Cordes Jun 01 '21 at 03:07
  • See [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) about default optimization levels. – Peter Cordes Jun 01 '21 at 03:17
  • almost a duplicate of [Efficiency of OpenMP vs optimisation levels](https://stackoverflow.com/q/64722158), but that's about parallelization, not SIMD. And finally found an actual duplicate, by searching on `openmp simd "-O0"` on google. [pragma omp for simd does not generate vector instructions in GCC](https://stackoverflow.com/q/61154047) – Peter Cordes Jun 01 '21 at 05:29

0 Answers0