0

Basically, my program is like:

#include <immintrin.h>
...
int* buf = (int*)_mm_malloc(sizeof(int) * 8, 32);
__m256i vi;
//some operations on vi
...
_mm256_store_epi32(buf, vi);
_mm_free(buf)

Compiler complained "error: ‘_mm256_store_epi32’ was not declared in this scope...note: suggested alternative: ‘_mm256_store_epi64’" when building the program(with flags-mavx -mavx2). What puzzled me was it compiled successfully once replace with _mm256_store_epi64. My gcc version is of 7.5.0.
a similar question posted here, but it didn't help. Can anybody provide me any workarounds?

Paul R
  • 208,748
  • 37
  • 389
  • 560
Finley
  • 795
  • 1
  • 8
  • 26

1 Answers1

0

According to the intrinsics guide (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_store_epi32&expand=5573,5567,5567,5567) the _mm256_store_epi32 intrinsic requires these CPUID Flags: AVX512VL + AVX512F.

May be should you use -mavx512vl -mavx512fto make this intrinsic available (provided your hardware target has support for it).

prog-fh
  • 13,492
  • 1
  • 15
  • 30
  • Hi, my flags are `-msse -msse2 -msse3 -msse4 -msse4a -mavx -mavx2 -mavx512vl -mavx512f -std=c++11 `, but it didn't work either – Finley Aug 02 '20 at 10:37
  • 1
    @Finley: probably not all compilers bother to provide the weird `_mm256_store_epi32` AVX512 intrinsic, because it has no advantage over `_mm256_store_si256` (except nicer prototype: a `void*` arg instead of `__m256i*`). See [How to emulate \_mm256\_loadu\_epi32 with gcc or clang?](https://stackoverflow.com/q/59649287). Either way you want it to compile to a `vmovdqa [mem], ymm`, not a 2-byte-larger `vmovqa32 [mem], ymm` – Peter Cordes Aug 02 '20 at 13:10
  • @PeterCordes `_mm256_store_si256` worked, thanks! – Finley Aug 04 '20 at 03:29