0

I am new to Openmp and now trying to use Openmp + SIMD intrinsics to speedup my program, but the result is far from expectation.

/*  program:simd.c  */
#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
#include<math.h>
#define M 10000
int main() {
    float a[M],b[M];
    double t1,t2;
    t1 = omp_get_wtime();
   
    for(int j = 0; j<M;j++)
     #pragma omp simd
    for(int i = 0; i<M;i++){
        a[i]=log(pow(2.71828,(pow(sin(pow(1.1,1.1)),1.1)+1.0))+j);
        b[i]=cos(log(pow(2.71828,(pow(sin(pow(1.1,1.1)),1.1)+1.0))+j));
        }
    t2 = omp_get_wtime();
    printf("simd time = %lfs\n",t2-t1);
    printf("a[10] = %f ,b[10] = %f\n\n",a[10],b[10]);

    t1 = omp_get_wtime();
    for(int j = 0; j<M;j++)
    for(int i = 0; i<M;i++){
        a[i]=log(pow(2.71828,(pow(sin(pow(1.1,1.1)),1.1)+1.0))+j);
        b[i]=cos(log(pow(2.71828,(pow(sin(pow(1.1,1.1)),1.1)+1.0))+j));
        }
    t2 = omp_get_wtime();
    printf("time = %lfs\n",t2-t1);
    printf("a[10] = %f ,b[10] = %f\n\n",a[10],b[10]);
    
    return 0;
}

I use wsl2 to run the code

gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

the result is almost same
enter image description here

another thing puzzles me : why use icc rather than gcc it runs faster: 300 times faster!!!

ivan@LAPTOP-JQJBOOBT:~$ icc simd.c -qopenmp -o simd
ivan@LAPTOP-JQJBOOBT:~$ ./simd
simd time = 0.026405s
a[10] = 9.210899 ,b[10] = -0.977215

time = 0.026401s
a[10] = 9.210899 ,b[10] = -0.977215

Hope you can help me figure out why or give me some advice, I'll be grateful for that!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
陈亦凡
  • 21
  • 2
  • 1
    `log(pow(2.71828,(pow(sin(pow(1.1,1.1)),1.1)+1.0))+j)` is loop-invariant for the inner array loop, it's basically just a memset or `std::fill`; both compilers should hoist it out of the loop, at least if you remember to enable optimization. And much of it is constants, so constant-propagation all the way out to the `log(constant + j)`. – Peter Cordes May 07 '22 at 08:34
  • 1
    ICC defaults to optimizing and mostly `-ffast-math`, GCC defaults to debug mode (`-O0` and precise FP math) – Peter Cordes May 07 '22 at 08:36
  • Thank you,so What else can I do to optimize it? – 陈亦凡 May 07 '22 at 08:58
  • `gcc -Ofast` should behave like ICC. There's nothing really to optimize here, compilers can already auto-vectorize filling arrays with a loop-invariant value even without hinting them with `#pragma omp simd`. Write a benchmark that has some work to do inside the loop if you want to compare auto-vectorization of math functions. – Peter Cordes May 07 '22 at 09:00
  • You can also use [Compiler Explorer](https://c.godbolt.org/) to find out which instructions are used by which compiler with which options. – paleonix May 07 '22 at 13:47

0 Answers0