Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that result_in_float
is of type __m256
, while result
is of type short int*
or short int[8]
.
for(i = 0; i < 8; i++)
result[i] = (short int)result_in_float[i];
I know that floats can be converted to 32 bit integers using the __m256i _mm256_cvtps_epi32(__m256 m1)
intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions.
Searching around the internet, I found an intrinsic by the name of_mm256_mask_storeu_epi16
, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.