Convert int32_t to unsigned char. AVX

Question

Need to correctly convert YMM with 8 int32_t to XMM with 8 UNSIGNED uint8_t at the bottom, using AVX intrinsics. It should be analogue of static_cast<uint8_t>. It means that C++ standard rules work (modular reduction). So we get truncation of the 2's complement bit-pattern.

For example, (int32_t)(-1) -> (uint8_t)(255), and +200 -> (uint8_t)(200) so we can't use signed or unsigned saturation to 8-bit (or even to 16-bit as an intermediate step).

I have this code as the example:

packssdw xmm0, xmm0
packuswb xmm0, xmm0
movd somewhere, xmm0

But these commands use unsigned saturation, so we get (int32_t)(-1) -> (uint8_t)(0).

I know vcvttss2si and it works correctly but only for one value. For the best performance I want to use vector registers.

Also I know about shuffling but it's enough slow for me.

So Is there another way to convert from int32_t YMM to uint8_t YMM as static_cast<uint8_t>?

UPD: The comment of @chtz is answer of my question.

If you have the idea that clear, why not just program it in assembler directly? — Ted Lyngmo, Jun 03 '22 at 21:02
It unclear what your question is. What is YMM? How would you fit a DWORD into eight bits? — Ulrich Eckhardt, Jun 03 '22 at 21:04
@TedLyngmo which idea? I show 3 variants and all of them aren't for me. So I ask for advice how to to convert correctly (as static_cast from c++) VMM with 8 int32_t values to VMM with 8 UNSIGNED int8_t values using intrinsics — alexa, Jun 03 '22 at 21:14
@UlrichEckhardt YMM - is vector register in AVX isa (XMM for SSE, ZMM for AVX512). The size is 256 bytes so we can fill fully YMM by 8 int32_t values — alexa, Jun 03 '22 at 21:16
@alexa I changed my idea of the question to that it needs clarity. — Ted Lyngmo, Jun 03 '22 at 21:16
To avoid saturation behavior you can bit-and with `0x000000ff` before packing. — chtz, Jun 03 '22 at 23:47
Another option is [using pshufb](https://stackoverflow.com/a/70824699/555045) but it's inconvenient for YMM registers. — harold, Jun 05 '22 at 14:18
@chtz maybe also do you know the same way for int32_t->int8_t conversion without saturation and ```pshufb```? It'd be perfect! — alexa, Jun 08 '22 at 14:39
@alexa For `int32_t` to `int8_t` without saturation, for what values do you expect different output than for `uint8_t`? (Note that the C++ behavior for signed overflow is undefined). — chtz, Jun 08 '22 at 15:17
@chtz I want to make the similar behavior as ```static_cast(int32_t)```. I mean ```int32_t(128)->int8_t(-128)```. But ```packuswb``` uses saturation (with your advise ```0x000000ff` and without) so I get ```int32_t(128)->int8_t(127)```. But for me it's incorrect result — alexa, Jun 09 '22 at 13:17
You still need to use `packuswb` `int32_t(-128) = 0xffffff80 -> 0x80 -> uint8_t(0x80) = int8_t(-128)` — chtz, Jun 11 '22 at 10:10

Convert int32_t to unsigned char. AVX

0 Answers0