I'm using both SSEx and AVXx intrinsics instruction. When I'm using Intel SSE2 or AVX2 and want to load a vector from memory I should use the following instruction (data type is int
):
_mm_load_si128( (__m128i *)&a[ i ][ j ]);
_mm256_load_si256( (__m256i *)&a[ i ][ j ]);
and when the data type is float
I should use like follows:
_mm_load_ps(&a[ i ][ j ]);
_mm256_load_ps(&a[ i ][ j ]);
so the question is what is the differences between float
and int
loading from memory that need a (type *)
or not?