0

Need to correctly convert YMM with 8 int32_t to XMM with 8 UNSIGNED uint8_t at the bottom, using AVX intrinsics. It should be analogue of static_cast<uint8_t>. It means that C++ standard rules work (modular reduction). So we get truncation of the 2's complement bit-pattern.

For example, (int32_t)(-1) -> (uint8_t)(255), and +200 -> (uint8_t)(200) so we can't use signed or unsigned saturation to 8-bit (or even to 16-bit as an intermediate step).

I have this code as the example:

packssdw xmm0, xmm0
packuswb xmm0, xmm0
movd somewhere, xmm0

But these commands use unsigned saturation, so we get (int32_t)(-1) -> (uint8_t)(0).

I know vcvttss2si and it works correctly but only for one value. For the best performance I want to use vector registers.

Also I know about shuffling but it's enough slow for me.

So Is there another way to convert from int32_t YMM to uint8_t YMM as static_cast<uint8_t>?

UPD: The comment of @chtz is answer of my question.

alexa
  • 16
  • 2
  • If you have the idea that clear, why not just program it in assembler directly? – Ted Lyngmo Jun 03 '22 at 21:02
  • It unclear what your question is. What is YMM? How would you fit a DWORD into eight bits? – Ulrich Eckhardt Jun 03 '22 at 21:04
  • @TedLyngmo which idea? I show 3 variants and all of them aren't for me. So I ask for advice how to to convert correctly (as static_cast from c++) VMM with 8 int32_t values to VMM with 8 UNSIGNED int8_t values using intrinsics – alexa Jun 03 '22 at 21:14
  • @UlrichEckhardt YMM - is vector register in AVX isa (XMM for SSE, ZMM for AVX512). The size is 256 bytes so we can fill fully YMM by 8 int32_t values – alexa Jun 03 '22 at 21:16
  • @alexa I changed my idea of the question to that it needs clarity. – Ted Lyngmo Jun 03 '22 at 21:16
  • 4
    To avoid saturation behavior you can bit-and with `0x000000ff` before packing. – chtz Jun 03 '22 at 23:47
  • @chtz it works the way I want! Thanks a lot! – alexa Jun 05 '22 at 13:57
  • 2
    Another option is [using pshufb](https://stackoverflow.com/a/70824699/555045) but it's inconvenient for YMM registers. – harold Jun 05 '22 at 14:18
  • @chtz maybe also do you know the same way for int32_t->int8_t conversion without saturation and ```pshufb```? It'd be perfect! – alexa Jun 08 '22 at 14:39
  • @alexa For `int32_t` to `int8_t` without saturation, for what values do you expect different output than for `uint8_t`? (Note that the C++ behavior for signed overflow is undefined). – chtz Jun 08 '22 at 15:17
  • @chtz I want to make the similar behavior as ```static_cast(int32_t)```. I mean ```int32_t(128)->int8_t(-128)```. But ```packuswb``` uses saturation (with your advise ```0x000000ff` and without) so I get ```int32_t(128)->int8_t(127)```. But for me it's incorrect result – alexa Jun 09 '22 at 13:17
  • 1
    You still need to use `packuswb` `int32_t(-128) = 0xffffff80 -> 0x80 -> uint8_t(0x80) = int8_t(-128)` – chtz Jun 11 '22 at 10:10

0 Answers0