Need to correctly convert YMM with 8 int32_t to XMM with 8 UNSIGNED uint8_t at the bottom, using AVX intrinsics. It should be analogue of static_cast<uint8_t>
. It means that C++ standard rules work (modular reduction). So we get truncation of the 2's complement bit-pattern.
For example, (int32_t)(-1)
-> (uint8_t)(255)
, and +200
-> (uint8_t)(200)
so we can't use signed or unsigned saturation to 8-bit (or even to 16-bit as an intermediate step).
I have this code as the example:
packssdw xmm0, xmm0
packuswb xmm0, xmm0
movd somewhere, xmm0
But these commands use unsigned saturation, so we get (int32_t)(-1)
-> (uint8_t)(0)
.
I know vcvttss2si
and it works correctly but only for one value. For the best performance I want to use vector registers.
Also I know about shuffling but it's enough slow for me.
So Is there another way to convert from int32_t YMM to uint8_t YMM as static_cast<uint8_t>
?
UPD: The comment of @chtz is answer of my question.