2

I struggling with Clang and GCC not utilizing the vectorized version of sincos() in libmvec when vectorizing a loop with sin() and cos(). This is related to Vectorization of sin and cos from 6 years ago.

void func( float *  p, int n, float a, float b ) 
{
    n = 8*4;
    for( int i = 0; i < n; i++ )
    {
      #pragma omp simd safelen(8) simdlen(8) aligned(p)
      for( int j = 0; j < n; j++ )
      {
        float angle = a*j+b*i;
        p[j+i*n] += a * sinf( angle ) + b * cosf( angle );
      }
    }
}
// gcc: -O3 -march=haswell -fopt-info-vec-missed -fopenmp-simd -ffast-math
// clang: -O3 -march=haswell -fopenmp-simd -ffast-math -fveclib=libmvec
// icc: -O3 -march=haswell

The assembler output for gcc, clang, and ICC can be found at godbolt:

GCC: https://gcc.godbolt.org/z/TnKz3YvqM does no vectorization whatsoever but only calls sincosf in libm, which is a known bug that seems to be forgotten https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70901.

Clang: https://gcc.godbolt.org/z/7db9xMTf6 Clang vectorizes sin() and cos() separately and calls _ZGVdN8v_sinf _ZGVdN8v_cosf in libmvec.

ICC: https://gcc.godbolt.org/z/5PMnznahv fully vectorizes sin and cos with a call of __svml_sincosf8_l9.

It there any workaround to make clang (and gcc) use libmvec's _ZGVdN8vvv_sincosf?

northwindow
  • 178
  • 8
  • I'm not sure I can see _why_ you'd want sincos here? In the case where you are using a single sin, and a single cos, then it's possible to speed up the evaluation by SIMD evaluating sincos(x) as: [sin(x), sin(x+pi/2)]. In the case as above, where everything has vectorised to YMM registers, there really isn't a huge saving by using a vectorised sincos. (Well, it could break a dependency chain or two, however the loop has been unrolled, so that's a bit of a moot point). – robthebloke Nov 02 '22 at 04:03
  • @robthebloke I don't see how calculating `sin(x+pi/2)` instead of `cos(x)` would help in this case. The benefit of `sincos(x)` that you have to calculate the range-reduction only once and for branchless-SIMD implementations you usually have to evaluate both a `sin` and a `cos`-polynomial on the `[-pi/4, pi/4]` interval and chose the correct one -- for `sincos` you can use both results. – chtz Nov 02 '22 at 23:30
  • This is also not an answer to your question: If this was your actual code, you could either use the identity `a*sin(x)+b*cos(x) = hypot(a,b) * sin(x+atan2(b,a))`. Or you could calculate `sincos` of `a` and `b` once and then use only the addition formulas for `sincos(a*j+b*i)`. – chtz Nov 02 '22 at 23:36

0 Answers0