0

I have been recently working on a benchmark called Namd benchmark, and there is a need to convert some of the intrinsics used in this benchmark which are in AVX512 to AVX2/ 256bit version.

As part of this, we have one intrinsic with name _mm512_mask_i32gather_epi32, for this the implementation in main code was

const __m512i type_j = _mm512_mask_i32gather_epi32(type_j, r2mask, index, memory_ptr, _MM_SCALE_4); 

See Intel intrinsic page.

The type_j is declared in the same line of the code, which equivalents to src vector in the main intrinsic page. We are unable to reproduce the same output for 256 version of the code. When the bit is not set, the 512 version is giving an input. So can we get some information on the role of src vector and internal working of the intrinsic.

Sample: for mask of 3101 binary: 0000110000011101
output from 512 version: 
Printing values of 512
48  0   75  75  48  48  48  48  48  75  48  75  48  75  48  48  
Printing converted values
48  255 75  75  48  241 241 242 240 250 48  75  248 245 229 232 

values are matching in case when bit is set to 1.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sai krishna
  • 168
  • 9
  • 2
    Ignoring the problem of your compiler allowing you to use a variable in its own declaration, I suspect the issue is that, for mask bits of 0, you're pulling from an uninitialized `type_j`, which is undefined behavior. Thus, you get different values. – Drew McGowen Jun 21 '22 at 05:22
  • 1
    Not sure what your actual question is. `_mm512_mask_i32gather_epi32` takes the values from `src` at the places where the mask is not set. If you take `type_j` itself as `src` parameter, you get the same kind of undefined behavior as writing `int x=x;` (your compiler is probably able to warn you about using uninitialized values). – chtz Jun 21 '22 at 11:14

0 Answers0