1

MOVMSKB does a really nice job of packing byte fields into bits.
However I want to do the reverse.
I have a bit field of 16 bits that I want to put into a XMM register.
1 byte field per bit.
Preferably a set bit should set the MSB (0x80) of each byte field, but I can live with a set bit resulting in a 0xFF result in the byte field.

I've seen the following option on https://software.intel.com/en-us/forums/intel-isa-extensions/topic/298374:

movd mm0, eax
punpcklbw mm0, mm0
pshufw mm0, mm0, 0x00
pand mm0, [mask8040201008040201h]
pcmpeb mm0, [mask8040201008040201h]

However this code only works with MMX registers and cannot be made to work with XMM regs because pshufw does not allow that.

I know I can use PSHUFB, however that's SSSE3 and I would like to have SSE2 code because it needs to work on any AMD64 system.

Is there a way to do this is pure SSE2 code?
no intrinsics please, just plain intel x64 code.

Johan
  • 74,508
  • 24
  • 191
  • 319
  • For those that are interested in the SSSE3 (and AVX2 for 32-bits) solution with intrinsics [here it is](http://stackoverflow.com/questions/24225786/fastest-way-to-broadcast-32-bits-in-32-bytes/24242696#24242696). – Z boson Feb 24 '16 at 08:56
  • 1
    @Zboson, SSSE3 is just a simple `SHUFB`. – Johan Feb 24 '16 at 16:50

1 Answers1

5

Luckily pshufd is SSE2, you just need to unpack it once more. I believe this should work:

movd xmm0, eax
punpcklbw xmm0, xmm0
punpcklbw xmm0, xmm0
pshufd xmm0, xmm0, 0x50
pand xmm0, [mask]
pcmpeqb xmm0, [mask]

Johan said:

If you're starting with a word the first unpack will give you a dword, allowing you to shorten it like so:

movd xmm0, eax
punpcklbw xmm0, xmm0
pshufd xmm0, xmm0, 0x00
pand xmm0, [mask]
pcmpeqb xmm0, [mask]

However this code should not work. Example: Assume input is 0x00FF (word), that is we want the low 8 bytes set.

punpcklbw xmm0, xmm0    ; 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF
pshufd xmm0, xmm0, 0x00 ; 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF FF
pand xmm0, [mask]       ; 00 00 02 01 00 00 02 01 00 00 02 01 00 00 02 01
pcmpeqb xmm0, [mask]    ; 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF FF

This is the wrong result because we wanted 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF. Sure, it does give you 8 set bytes, just not the 8 which correspond to the bits.

Johan
  • 74,508
  • 24
  • 191
  • 319
Jester
  • 56,577
  • 4
  • 81
  • 125
  • Yep that works, and it can be shorted with a `SHUFPS`, but I'm not sure if the mixing of integer and floating sse instructions does not invoke a penalty. (I vaguely recall a penalty for generating non-normalized singles in SSE instructions that deal with those. – Johan Feb 24 '16 at 00:04
  • No, your `SHUFPS` version is wrong. `SHUFPS` also shuffles dwords just like `PSHUFD` so the extra unpack is needed. – Jester Feb 24 '16 at 00:07
  • Ehm, the first and second version work (TM) on my machine. I start out with a word, so the first unpack gives me a dword. – Johan Feb 24 '16 at 00:12
  • 1
    @Johan see update. Don't you agree? Also, what's the point of `SHUFPS`, it does exactly what `PSHUFD` does, so you could then just leave the `PSHUFD`. – Jester Feb 24 '16 at 00:18
  • Bummer, missed that, however the short version might still be useful, if you're able to deal with the mixing of the bits, fully agree with the uselessness of `SHUFPS`. It does give you a result you can -cough sort of cough- work with, at the cost of extra complexity. – Johan Feb 24 '16 at 00:25
  • Yeah the short version works, if you mangle the mask in the same way the input gets mangled. The single instruction saved does not compensate for the confusion though and will likely require additional work later on to compensate, good call and many thanks. – Johan Feb 24 '16 at 00:48
  • No it can't work. The result after the `pshufd` already has zeroes, no amount of masking can turn those into non-zero. Just look at the lowest dword `00 00 FF FF` has to somehow give `FF FF FF FF`. Or by "works" you mean that it gives mangled result? Yeah, that could be true ;) – Jester Feb 24 '16 at 00:51
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/104350/discussion-between-johan-and-jester). – Johan Feb 24 '16 at 00:53