I have been recently working on a benchmark called Namd benchmark, and there is a need to convert some of the intrinsics used in this benchmark which are in AVX512 to AVX2/ 256bit version.
As part of this, we have one intrinsic with name _mm512_mask_i32gather_epi32
, for this the implementation in main code was
const __m512i type_j = _mm512_mask_i32gather_epi32(type_j, r2mask, index, memory_ptr, _MM_SCALE_4);
See Intel intrinsic page.
The type_j
is declared in the same line of the code, which equivalents to src vector in the main intrinsic page. We are unable to reproduce the same output for 256 version of the code. When the bit is not set, the 512 version is giving an input. So can we get some information on the role of src vector and internal working of the intrinsic.
Sample: for mask of 3101 binary: 0000110000011101
output from 512 version:
Printing values of 512
48 0 75 75 48 48 48 48 48 75 48 75 48 75 48 48
Printing converted values
48 255 75 75 48 241 241 242 240 250 48 75 248 245 229 232
values are matching in case when bit is set to 1.