2

I'm using both SSEx and AVXx intrinsics instruction. When I'm using Intel SSE2 or AVX2 and want to load a vector from memory I should use the following instruction (data type is int):

_mm_load_si128( (__m128i *)&a[ i ][ j ]);
_mm256_load_si256( (__m256i *)&a[ i ][ j ]);

and when the data type is float I should use like follows:

_mm_load_ps(&a[ i ][ j ]);
_mm256_load_ps(&a[ i ][ j ]);

so the question is what is the differences between float and int loading from memory that need a (type *) or not?

ADMS
  • 117
  • 3
  • 18
  • 2
    Yet another inconsistent design decision for intrinsics that leads to cluttered and hard-to-read code. Of course, an integer vector can be 4 ints, 8 shorts, or 16 chars, so there's no obvious type for the pointer. `char*` can alias anything, so requiring casts to `char*` might possibly have led some compilers to make worse code. `void*` would have been nice, esp. for C where conversion to `void*` can happen implicitly, without a cast. – Peter Cordes May 03 '16 at 06:06
  • Just compiling problem? Do you think there must be another reasons? IDK, maybe differences between floating storage and integer storage or some thing about the hardware design? Not only the software problem? – ADMS May 03 '16 at 08:39
  • Well, it is not just an *int* or *float*, it is 4 or 8 of them. The most likely mishap next is not incrementing j correctly or your program crashing because the array element is not aligned correctly. If the array is allocated on the free store then you only have 50% / 25% odds that it won't crash. – Hans Passant May 03 '16 at 09:25
  • IDK, are you sure about the connection between this syntax and misalignment problem? – ADMS May 03 '16 at 09:52
  • 1
    I think [this answer](http://stackoverflow.com/questions/24787268/how-to-implement-mm-storeu-epi64-without-aliasing-problems/24788226#24788226) explains this well. Note that this is fixed with the AVX512 intrinsics. – Z boson May 03 '16 at 13:00
  • 2
    @ADMS: There's nothing hardware-related. You can use `movaps`/`movups` to store integer data with no penalty on all known CPUs (and its encoding is one byte shorter than `movdqa`). I forget if any CPUs care about using `movaps` to load integer data. (The difference matters on Nehalem for reg-reg moves, because it doesn't handle them in the rename stage, and it has separate bypass forwarding domains for int vs. float). @ZBoson: Thanks for the link. I hadn't noticed AVX512 used `void*`, that's great. And basically proves my point. – Peter Cordes May 03 '16 at 14:51

0 Answers0