-2

I want to ask what is the use for MOVMSKB operation?

I try to find the documentation, but I cannot find the information related.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • there is `PMOVMSKB` did you mean it ? https://www.felixcloutier.com/x86/pmovmskb – pavelbere May 16 '19 at 08:28
  • no, I can find `PMOVMSKB` and `MOVMSKPS` `MOVMSKPD`but I cannot find `MOVMSKB` which is what I am looking for. – TIANYANG ZHANG May 16 '19 at 09:43
  • 2
    The is no instruction named `MOVMSKB` in the Intel ISA. – zx485 May 16 '19 at 12:20
  • 1
    Why do you think there is a `MOVMSKB` instruction separate from `PMOVMSKB`? There isn't, so perhaps your real question should be about how to interpret whatever you were reading? Maybe the `_mm_movemask_epi8` intrinsic? And BTW, the use-cases for `pmovmskb` include search loops like `strlen` or `memchr`, as well as an index for a lookup table of `pshufb` masks e.g. for left-packing, or parsing IPv4 dotted-quad strings into integers. – Peter Cordes May 16 '19 at 12:27
  • That instruction is mentioned in a paper, [link](https://eprint.iacr.org/2016/768.pdf)(just above section 5.2) @PeterCordes – TIANYANG ZHANG May 17 '19 at 09:23

1 Answers1

1

The paper you're reading describes in the next sentence exactly what it does:

This instruction creates a 16-bit mask from the most significant bits of 16 signed or unsigned 8-bit integers in a register and zeroes the upper bits [of the destination]

That's exactly what pmovmskb does on an XMM register, so obviously that's the instruction they're talking about. They intentionally or accidentally left out the p (for packed-integer) from the mnemonic.

Their diagram of how it works is (incorrectly) labeled with vpmovmskb reg, ymm1. With a YMM source, vpmovmskb produces a 32-bit mask.

(Although if the input YMM register has been written via the XMM low half with a VEX-encoded instruction like vpxor xmm1, xmm2, xmm3, then the upper 16 bytes would be all zero, so they'd get the result they described for a different reason.)


Its use-cases include include search loops like strlen or memchr (where lzcnt / tzcnt are useful to find which element once you find a match or mismatch element).

Or creating an index for a lookup table of pshufb masks e.g. for left-packing, or even as part of parsing IPv4 dotted-quad strings into integers. Fastest way to get IPv4 address from string

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847