3

I'm trying to learn shuffling using this example in C from the GCC manual

typedef int v4si __attribute__ ((vector_size (16)));
     
v4si a = {1,2,3,4};
v4si b = {5,6,7,8};
v4si mask = {0,4,2,5};
v4si res = __builtin_shuffle (a, b, mask);    /* res is {1,5,3,6}  */

I don't understand what the mask does exactly? All I can find online is similar to this:

The shuffle mask operand specifies, for each element of the result vector, which element of the two input vectors the result element gets

But it doesn't explain how? is there AND, OR going on? what do the numbers in mask mean?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Josh
  • 43
  • 1
  • 4

2 Answers2

4

mask isn't an AND mask; the shuffle-control vector is a vector of indices into the concatenation of the source vectors. Each result element is basically the result of res[i] = ab[ mask[i] ].

SIMD shuffles are parallel table-lookups, where the the control vector (called "mask" for short, for some reason) is a vector of indices and the other inputs are the table.

Related: Convert _mm_shuffle_epi32 to C expression for the permutation? shows a plain C equivalent for _mm_shuffle_epi32 (pshufd) with compile-time-constant indices. You have a 2-input shuffle that indexes into the concatenation of a and b (in that order).

AVX1/AVX2 doesn't have a shuffle that actually does this for runtime-variable inputs, so that __builtin_shuffle would have to compile to multiple instructions.

AVX512F vpermt2d works exactly this way, though.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks but I still don't understand. As an example, looking at the second mask which is a `4`, what does it do? – Josh Jul 15 '19 at 01:42
  • @Josh: it indexes element #4 from `{1,2,3,4, 5,6,7,8};` which is `5`. Like I said, the "table" is the concatenation of the 2 input vectors. – Peter Cordes Jul 15 '19 at 01:45
  • thanks a lot, `ab[ mask[i] ]` helps picturing it, now for the final clarification, could you go through the full example with mask `4`. so now we have `5`, then what? it just places 5 in the dest index? that's it? – Josh Jul 15 '19 at 01:54
0

Example:

const int start = 20;
const int length = 32;
var arr1 = Enumerable.Range(start, start + length).ToArray();
var arr1LeftPtr = (int*)arr1.AsMemory().Pin().Pointer;

Vector128<int> left = Sse2.LoadVector128(arr1LeftPtr);  // left: 20, 21, 22, 23

Vector128<int> reversedLeft = Sse2.Shuffle(left, 0b00_01_10_11);  // left: 23, 22, 21, 20
Vector128<int> reversedLeft2 = Sse2.Shuffle(left, 0b11_10_01_00); // left: 20, 21, 22 , 23
Vector128<int> reversedRight = Sse2.Shuffle(left, 0b00_01_00_01); // left: 21, 20, 21, 20
Lydon Ch
  • 8,637
  • 20
  • 79
  • 132
  • 1
    Just for the record, this is C#, with System.Numerics intrinsics, not C with Intel's intrinsics, or GCC's native vector extensions (https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html) which this question is using. Also, your comments on vector contents appear to be written in C/C# array-initializer order, with lowest element at the left, not the place-value ordering of highest element at the left which makes vector left-shift make sense. Your choice is a common one, but opposite of Intel's diagrams in their docs. – Peter Cordes Dec 27 '22 at 17:51
  • thanks for the comment, now I know better :-) – Lydon Ch Dec 28 '22 at 15:49