0

I use SSE and I want to duplicate the last byte of each double word 4 times of XMM0 but I don't know how to do! (maybe with (un)packs?)

To illustrate, I'd like to do this.

Thanks for your help!

  • 1
    Which versions of SSE are available? SSSE3 would make this easy – harold Jan 03 '19 at 22:42
  • Without `pshufb`, you might mask with `set1_epi32(0x000000ff)`, then shift + OR. Then `pshuflw` / `pshufhw` to broadcast bytes. That's probably more efficient than masking + `packusdw` / `wb` down to words and bytes, and then `punpcklbw` / `wd` back up to dwords. – Peter Cordes Jan 04 '19 at 02:46

1 Answers1

2

You can do this with the SSSE3 command PSHUFB like this (MASM 32-bit assembly)

.data 
  align 16
  mask  db 0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12
.code
  ; value in XMM0                  ; 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
  pshufb xmm0, xmmword ptr [mask]  ; 12 12 12 12 08 08 08 08 04 04 04 04 00 00 00 00

That the output seems to match the mask is a coincident.
I couldn't test this at the moment, the order of the mask bytes may be reversed. But you should get the idea.

Anyways: take care of alignment, because

When the source operand is a 128-bit memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
zx485
  • 28,498
  • 28
  • 50
  • 59
  • Your mask *was* backwards (before my edit). That order would be correct for a `_mm_set_epi8`, which takes args in high..low order. But the low element (index 0) is loaded/stored from the lowest address in memory, so this also reverses the order of the dwords in your register. The notation in your comment is Intel's normal ordering which is opposite of memory order (C array initializers, and asm `db`). See [Convention for displaying vector registers](https://stackoverflow.com/q/41351087) for more discussion about big vs. little "endian" vector notation. – Peter Cordes Jan 04 '19 at 02:39
  • You'd normally put a vector constant in `.rdata` (read-only data on Windows), not `.data`. (Or `.rodata` on non-Windows.) – Peter Cordes Jan 04 '19 at 02:43