1

Intel's manual mentions that, it may generate exception, wording seems a little bit interesting.

Load 128-bits of integer data from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

Here is my sample code, none of the methods did not cause any exception with Debug/Release builds. ※Using Visual Studio 2019

int someMethodHeapAlloc(){
    auto allocated = (bool*)_aligned_malloc(32*sizeof(bool), 2);
    auto loaded    = _mm_load_si128((__m128i*)&allocated[3]); //Here, I expect exception
    auto compared  = _mm_movemask_epi8(loaded, _mm_setzero_si128());
    _aligned_free(allocated);
    return compared;
}

int someMethodStackAlloc(){
    alignas(2) bool allocated[32]{};
    auto loaded    = _mm_load_si128((__m128i*)&allocated[3]); //Here, I expect exception
    auto compared  = _mm_movemask_epi8(loaded, _mm_setzero_si128());
    return compared;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Hasan Emrah Süngü
  • 3,488
  • 1
  • 15
  • 33
  • 1
    Check the dissassembly that a `movdqa` is being generated and not a `movdqu` for whatever reason. It may be that this only has an issue on page boundaries - you should try that (allocate > 16k, and try to load from every byte offset?). Obvisouly, you should follow the documentation even if sometimes it works loading unaligned tho. – Mike Vine Sep 30 '22 at 18:09
  • Also worth checking that even in debug its not optimising the load away (unlikely as that is). – Mike Vine Sep 30 '22 at 18:11
  • If you copy/pasted this code into VS, it wouldn't compile because you didn't `#include `. So in that sense it's not a [mcve]. It also doesn't have a `main`. That might have been what @HenriqueBucher was referring to. But it's not a big deal because it's still pretty easy to get asm output for these functions, which makes it obvious for people who didn't already know that MSVC and ICC avoid using alignment-required SIMD `mov...` instructions when they can. – Peter Cordes Sep 30 '22 at 19:29
  • 1
    MSVC only uses instructions that do alignment checking when its folds a load into a memory source operand without AVX enabled, like `pxor xmm0,xmm0` / `pcmpeqb xmm0, [rax]` after an aligned alloc or something. Or for NT load/store as Mysticial points out on the linked duplicate, because there is no unaligned version. See also [Is there a way to force visual studio to generate aligned sse intrinsics](https://stackoverflow.com/q/61816101) – Peter Cordes Sep 30 '22 at 19:32
  • @PeterCordes, Thanks for your comment. I guess you also linked the related answer. And checking the assembly, I noticed the `movdqu` calls, just like @Mike Vine mentioned. Therefore the reason for not getting any exceptions – Hasan Emrah Süngü Sep 30 '22 at 19:33
  • @PeterCordes there were issues with the code that the author fixed. – Something Something Sep 30 '22 at 19:39
  • @HenriqueBucher: Your [mcve] comment came over 10 minutes after the last edit to the question so I assumed there were still errors. If that was a mistake, you should probably delete that comment, even if you were still annoyed that there were errors in the first place which wasted your time. – Peter Cordes Sep 30 '22 at 19:42
  • Nah, he tried to answer the question, gave a bunch of completely wrong answers, I downvoted then he deleted his answer and downvoted my question :D – Hasan Emrah Süngü Sep 30 '22 at 19:43
  • @PeterCordes The original question used _mm_cmpeq_epi8 which returned a long[2] array from an int function. Code did not even compile. He edited and changed to _mm_movemask_epi8 which returns an int. – Something Something Oct 02 '22 at 13:19

0 Answers0