1

The permute command from AVX2 instructions needs a parameter from type imm8. This parameter controls how the permutation is performed. Unfortunately I do not understand how this imm8 parameter is "created". What value do I have to set or how can I determine what value I have to set for a specific permuation?

Example: _mm256_permute_pd(vec2, 0x5);

Here the parameter 0x5 permutes the first and second double in vec2 and the third and fourth double in vec2. But how do I know that 0x5 does that?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
vydesaster
  • 233
  • 1
  • 10
  • 1
    `_mm256_permute_pd` only requires AVX1, it's `vpermilpd`, an in-lane permute. Were you mixing it up with AVX2 `_mm256_permute4x64_pd` (`vpermpd`)? Intel's intrinsic names can be confusing when trying to match them up with asm instructions. – Peter Cordes Dec 22 '18 at 02:53

1 Answers1

3

It's 4x 1-bit indices that select one of the two elements from the corresponding lane of the source vector, for each destination element. Read the Operation section of the docs for the asm instruction: http://felixcloutier.com/x86/VPERMILPD.html.

Or look it up in Intel's intrinsics guide, which has similar pseudo-code that shows exactly how each bit selects the source for an element of the result.

It's not lane-crossing vpermpd, so it's not like the 2-bit indices that _MM_SHUFFLE is a helper macro for, so it's not quite like Convert _mm_shuffle_epi32 to C expression for the permutation?.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I am sorry but I do not really get it. Could you please explain it with an example? The value of the bit is given in hexa, isn't it? – vydesaster Dec 22 '18 at 21:49
  • 2
    @vydesaster: The immediate byte is a number. Hex is one way to specify it, but it's C so any way of writing a numeric constant is valid. Base 2 is a good choice because each bit is a separate field. e.g. `0b0101` is another way to write `5` or `0x5`, the same constant. The low bit is the selector for the low element of the destination, and it selects element 1 from the low lane of the source. `0b1010` is the identity shuffle that copies the input unchanged. – Peter Cordes Dec 22 '18 at 21:53
  • Ok, this makes sense now. I think I got it. Thank you very much. – vydesaster Dec 23 '18 at 11:22