-1

While trying to load _mm_loadu_epi8 instruction which is defined in AVX512 family of Intel Intrinsics instruction was getting error in c++ that - Usage of _mm_loadu_epi8 leads to error - ‘_mm_loadu_epi8’ was not declared in this scope.

Tried to use the following flags while compiling the code but ended up with the same error -

       1. march=cascadelake
       2. march=skylake-avx512
       3. march=knl
       4. -mavx512f -mavx512wl -mavx512bw

To check if version of g++ used is causing issues, g++-9 and g++-10 while compiling the same. The same issue persisted.

What could be done to use the required _mm_loadu_epi8 instruction. A code snippet has been attached here for the same. Thanks

#include <immintrin.h>
#include <cstdint>

using namespace std;

int main() {
    uint8_t arr[64] = { 32, 25, 24, 16, 15, 13, 12, 19, 31, 32, 30, 29, 35, 36, 39, 40, 32, 25, 24, 16, 15, 13, 12, 19, 31, 32, 30, 29, 35, 36, 39, 40, 32, 25, 24, 16, 15, 13, 12, 19, 31, 32, 30, 29, 35, 36, 39, 40, 32, 25, 24, 16, 15, 13, 12, 19, 31, 32, 30, 29, 35, 36, 39, 40 };
    __m128i input16_old = _mm_loadu_epi8(arr);
    //__m128i input16_new = _mm_loadu_si128((__m128i*)arr);
    //int output = _mm_cmpneq_epi8_mask(input16_old, input16_new);
}

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
Srihari S
  • 17
  • 4
  • 2
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95483 - fixed in GCC 11 – 273K Aug 14 '23 at 10:57
  • It's not required, it's exactly the same as `_mm_loadu_si128((__m128i*)arr)`. With no masking, your choice of intrinsic isn't going to get the compiler to emit a `vmovdqu8 xmm` when it could emit a shorter `vmovdqu xmm`. (Or conversely, with GCC missed-optimization bugs, using the old intrinsic won't always *stop* GCC from wasting machine-code bytes on `vmovdqu32`.) – Peter Cordes Aug 14 '23 at 20:29

0 Answers0