I am a bit curious about the behaviour of
_mm512_mask_i32scatter_epi32(void* base_addr, __mask16 mask, __512i idx, __512i data, int scale)
This intrinsic should scatter the 32-bit integers from the data register using 32-bit indices from the idx register. A value is stored, only if the corresponding bit is set within the mask register. Following the official documentation, the values are stored starting at base_addr with the corresponding offset from the idx register. Scale is used to scale the offset.
My data register (data_reg) looks like that:
[ 0] = 4 [ 4] = 0 [ 8] = 0 [12] = 0
[ 1] = 5 [ 5] = 0 [ 9] = 0 [13] = 0
[ 2] = 4 [ 6] = 0 [10] = 0 [14] = 0
[ 3] = 0 [ 7] = 0 [11] = 0 [15] = 0
The index register (idx_reg) looks like that:
[ 0] = 0 [ 4] = 8 [ 8] = 16 [12] = 24
[ 1] = 2 [ 5] = 10 [ 9] = 18 [13] = 26
[ 2] = 4 [ 6] = 12 [10] = 20 [14] = 28
[ 3] = 6 [ 7] = 14 [11] = 22 [15] = 30
The mask register (mask_reg) looks like that:
[ 0] = 1 [ 4] = 0 [ 8] = 0 [12] = 0
[ 1] = 1 [ 5] = 0 [ 9] = 0 [13] = 0
[ 2] = 1 [ 6] = 0 [10] = 0 [14] = 0
[ 3] = 0 [ 7] = 0 [11] = 0 [15] = 0
I call the intrinsic like that:
_mm512_mask_i32scatter_epi32( result_array, mask_reg, idx_reg, data_reg, 1);
The resulting data (result_array) looks like that:
[ 0] = 327684 [ 4] = 0 [ 8] = 0 [12] = 0
[ 1] = 4 [ 5] = 0 [ 9] = 0 [13] = 0
[ 2] = 0 [ 6] = 0 [10] = 0 [14] = 0
[ 3] = 0 [ 7] = 0 [11] = 0 [15] = 0
but it SHOULD look like that:
[ 0] = 4 [ 4] = 4 [ 8] = 0 [12] = 0
[ 1] = 0 [ 5] = 0 [ 9] = 0 [13] = 0
[ 2] = 5 [ 6] = 0 [10] = 0 [14] = 0
[ 3] = 0 [ 7] = 0 [11] = 0 [15] = 0
Did I missed something or is this behaviour kind of strange?
Sincerely