0

I think the vpshufb instruction would work well for something that I'm trying to do, but I don't know how to use the shuffle control mask to control where parts of the vector are shuffled, and I cannot find information on how to do this on the internet.

Intrinsic Version of the Instruction: _mm256_shuffle_epi8(__m256i a, __m256i b)

  • 3
    See: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_shuffle_epi8&expand=4754,4754 – Paul R Jun 25 '18 at 19:57
  • 1
    See [Convert \_mm\_shuffle\_epi32 to C expression for the permutation?](https://stackoverflow.com/q/37084379), but replace `_MM_SHUFFLE` with `_mm_set_epi8(i15, i14, i13, ..., i1, i0)` because `shuffle_epi8` uses a vector variable instead of bits packed into an integer. (And the 256b version does two separate shuffles in the low and high and high 128-bit lane. See http://felixcloutier.com/x86/PSHUFB.html) – Peter Cordes Jun 25 '18 at 21:43
  • Just remember that shuffle does NOT cross lanes in the AVX2 version (a common mistake/misunderstanding made when using AVX2). – ChipK Jun 26 '18 at 15:03

0 Answers0