Is there an Intrinsic instruction for result[i] += A[k] * sin(B[k] * C[i] + D[k])?

Question

I have a simple code line (64 bytes in form of 8 doubles - exactly one i7 cache line) in a for i loop which is nested in for k loop:

 result[i] += A[k] * sin(B[k] * C[i] + D[k])

I look around intell intrinsics manual yet seem to be lost: how to query for such function?

It is highly unlikely that there is an intrinsic for the four-parameter operation `a * sin(b * c + d)`. This isn't a fundamental operation that a CPU is likely to have a dedicated instruction for. — Raymond Chen, May 18 '16 at 05:44

score 1 · Accepted Answer · answered May 18 '16 at 10:44

Wait a minute, is i or k in the inner loop? Assuming k is constant for all i, then broadcast A[k] into a whole vector, with _mm256_set1_pd(A[k]), and same for the other array[k] operands.

As Raymond says, that's way to complex for a single instruction. Even sin() isn't implemented in hardware (except for scalar the x87 version). Intel's intrinsic guide lists some Intel library functions that only Intel's SVML provides, not part of gcc / clang's <immintrin.h>.

Use an FMA (_mm256_fmadd_pd) for B[k] * C[i] + D[k], and pass that result to a vectorized sin() function, if you can find one.

Feed that result into another FMA for the result[i] += A[k] * ....

This of course takes two 32B vectors with AVX.

AVX512 does 64B vectors, but is only available in Xeon Phi accelerator cards for now.

Thank you very much! BTW could you link `result[i] += A[k] * ...` intrinsics (at least for 32B)? — DuckQueen, May 19 '16 at 03:11
@DuckQueen: There's only one intrinsic for packed-double FMA, so there's nothing else to link. The C compiler takes care of choosing between `VFMADD132PD` / `VFMADD231PD`. There's also a `fmsub_pd` intrinsic, and [`_mm256_fnmadd_pd`](http://www.felixcloutier.com/x86/VFNMADD132PD:VFNMADD213PD:VFNMADD231PD.html) to negate the product (e.g. `d - b*c`). Since all your operations are adds, you just need the same FMA inside and outside the sin(). — Peter Cordes, May 19 '16 at 03:56

Is there an Intrinsic instruction for result[i] += A[k] * sin(B[k] * C[i] + D[k])?

1 Answers1