2

I came to those two functions:

  • _mm512_setzero_epi32()
  • _mm512_setzero_si512()

Logically, they are the same doing the same thing. Then I checked the generated Assembly and also found the same under different optimization levels.

It is a simple question to ask: why the AVX512 has such a duplicated design to set 0 for int?

Jigao Luo
  • 127
  • 1
  • 6
  • There are some duplications in AVX-512: https://stackoverflow.com/questions/53905757/what-is-the-difference-between-mm512-load-epi32-and-mm512-load-si512 – Jigao Luo Dec 13 '22 at 09:11

1 Answers1

3

_mm512_setzero_epi32() is 100% redundant, no reason to ever use

For coding-style reasons, I'd recommend against it. It doesn't follow the same pattern of _mm_setzero_si128() / _mm256_setzero_si256() for returning a SIMD-integer vector of all-zeros which _mm512_setzero_si512() follows.

The situation is very similar to the useless and redundant _mm512_loadu_epi32 (which confusingly loads a whole 64-byte vector, not a 4-byte scalar). Not all compilers even support _mm512_loadu_epi32 or _mm512_loadu_epi64, which might also be the case for _mm512_setzero_epi32; another reason to avoid it in favour of more standard and obvious ones.

For redundant intrinsics like _mm512_loadu_epi32 and _mm512_and_epi32, they're part of a pattern like _mm512_maskz_loadu_epi32 and _mm512_mask_loadu_epi32; masking requires an element size, and having an unmasked intrinsic as least forms a pattern like for _mm512_add_epi32 where different element-size versions of the same operation have to exist, and there is not _si512 version.

But there are no merge-masking or zero-masking setzero intrinsics in the current version of the intrinsics guide. So there's no pattern for setzero_epi32 to be part of.


In asm, there is no vpxor zmm, only vpxord and vpxorq, because essentially all AVX-512 instructions support masking, and that means there has to be an element size. (Same for moves like vmovdqa64 / 32.)

So does _mm512_setzero_epi32() imply use of vpxord? No, Intel's intrinsics guide actually documents it as using vpxorq, like all other 512-bit zeroing intrinsics (including _mm512_setzero_ps() - fun fact; EVEX vxorps requires the AVX512DQ extension, not supported in KNL Xeon Phi, only in mainstream (Skylake-avx512 and later) CPU).

As for what zeroing instruction compilers actually choose to use, could be either, and it makes no difference.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847