1

I am a bit curious about the behaviour of

_mm512_mask_i32scatter_epi32(void* base_addr, __mask16 mask, __512i idx, __512i data, int scale)

This intrinsic should scatter the 32-bit integers from the data register using 32-bit indices from the idx register. A value is stored, only if the corresponding bit is set within the mask register. Following the official documentation, the values are stored starting at base_addr with the corresponding offset from the idx register. Scale is used to scale the offset.

My data register (data_reg) looks like that:

[ 0] = 4    [ 4] = 0    [ 8] = 0    [12] = 0    
[ 1] = 5    [ 5] = 0    [ 9] = 0    [13] = 0    
[ 2] = 4    [ 6] = 0    [10] = 0    [14] = 0    
[ 3] = 0    [ 7] = 0    [11] = 0    [15] = 0

The index register (idx_reg) looks like that:

[ 0] = 0    [ 4] = 8    [ 8] = 16   [12] = 24   
[ 1] = 2    [ 5] = 10   [ 9] = 18   [13] = 26   
[ 2] = 4    [ 6] = 12   [10] = 20   [14] = 28   
[ 3] = 6    [ 7] = 14   [11] = 22   [15] = 30   

The mask register (mask_reg) looks like that:

[ 0] = 1    [ 4] = 0    [ 8] = 0    [12] = 0    
[ 1] = 1    [ 5] = 0    [ 9] = 0    [13] = 0    
[ 2] = 1    [ 6] = 0    [10] = 0    [14] = 0    
[ 3] = 0    [ 7] = 0    [11] = 0    [15] = 0    

I call the intrinsic like that:

_mm512_mask_i32scatter_epi32( result_array, mask_reg, idx_reg, data_reg, 1);

The resulting data (result_array) looks like that:

[ 0] = 327684   [ 4] = 0    [ 8] = 0    [12] = 0    
[ 1] = 4        [ 5] = 0    [ 9] = 0    [13] = 0    
[ 2] = 0        [ 6] = 0    [10] = 0    [14] = 0    
[ 3] = 0        [ 7] = 0    [11] = 0    [15] = 0    

but it SHOULD look like that:

[ 0] = 4        [ 4] = 4    [ 8] = 0    [12] = 0    
[ 1] = 0        [ 5] = 0    [ 9] = 0    [13] = 0    
[ 2] = 5        [ 6] = 0    [10] = 0    [14] = 0    
[ 3] = 0        [ 7] = 0    [11] = 0    [15] = 0    

Did I missed something or is this behaviour kind of strange?

Sincerely

Paul R
  • 208,748
  • 37
  • 389
  • 560
Hymir
  • 811
  • 1
  • 10
  • 20

1 Answers1

1

The indices are byte offsets, not element offsets, i.e. you need to either multiply the indices in idx_reg by sizeof(int32_t), or pass sizeof(int32_t) as the scale parameter instead of 1.

See also this related question.

Paul R
  • 208,748
  • 37
  • 389
  • 560