0

I wrote the test code as below. If I set mask 0b1111 or 0b0000, it works fine. If I use the mask combined with 01, 0b1101 0b1001..., the program crashed. A SIGILL signal which means illegal instruction is received in _mm_mask_add_ps when I debug.
Any help is appreciated. Thanks.

    __m128 vec0 =_mm_setr_ps(1,2,3,10);
    __m128 vec1 =_mm_setr_ps(4,5,6,10);
    __m128 src = _mm_setr_ps(14,15,16,110);
    __mmask8 mask = 0b1101;
    __m128 res = _mm_mask_add_ps(src, mask, vec0, vec1);

    alignas (16) float arr[4];
    _mm_store_ps(arr, res);

    float *p = arr;
    cout<<*p++<<endl;
    cout<<*p++<<endl;
    cout<<*p++<<endl;
    cout<<*p<<endl;
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
HLI
  • 11
  • 2
  • 1
    Get an AVX-512 CPU, or don't use AVX-512 instructions. (With trivial masks, the compiler can optimize the intrinsic to the SSE or AVX1 form.) Also, you probably don't want to leave `float arr[4]` uninitialized if you're doing conditional stores to it. – Peter Cordes Mar 29 '22 at 07:34
  • 1
    Or use SDE or another emulator to test AVX-512 code on a CPU without AVX-512. If you actually just wanted to do masked `_ps` stores in general, look at AVX1 `_mm_maskstore_ps` (i.e. `vmaskmovps`), but note it's slow on AMD CPUs. – Peter Cordes Mar 29 '22 at 07:40
  • @PeterCordes Thank you so much bro for your explanation and associated link you provided. – HLI Mar 29 '22 at 08:01

0 Answers0