The Intel reference manual is fairly opaque on the details of how instructions are used. There are no examples on each instruction page.
https://software.intel.com/sites/default/files/4f/5b/36945
It's possible we haven't found the right page, or the right manual yet. In any case, the syntax of gnu as might be different.
So I thought perhaps we could call intrinsics, step in and view the opcodes in gdb, and from that perhaps learn the opcodes in gnu as. we also searched for the opcodes for avx2 in gnu as but didn't find them documented.
Starting with the c++ code that isn't compiling:
#include <immintrin.h>
int main() {
__m256i a = _mm256_set_epi32(1, 2, 3, 4, 5, 6, 7, 8); // seems to work
double x = 1.0, y = 2.0, z = 3.0;
__m256d c,d;
__m256d b = _mm256_loadu_pd(&x);
// __m256d c = _mm256_loadu_pd(&y);
// __m256d d = _mm256_loadu_pd(&z);
d = _mm256_fmadd_pd(b, c, d); // c += a * b
_mm256_add_pd(b, c);
}
g++ -g We would like to be able to load a vector register %ymm0 with a single value, which appears to be supported by the intrinsic: _mm256_loadu_pd. Fused multiply-add and add are also there. All intrinsics except the first give errors.
/usr/lib/gcc/x86_64-linux-gnu/9/include/fmaintrin.h:47:1: error: inlining failed in call to always_inline ‘__m256d _mm256_fmadd_pd(__m256d, __m256d, __m256d)’: target specific option mismatch
47 | _mm256_fmadd_pd (__m256d __A, __m256d __B, __m256d __C)
| ^~~~~~~~~~~~~~~
Next, what are the syntax of the underlying assembler instructions? If you could point to a manual showing them that would be very helpful.