You use __mmask8
with other AVX-512 intrinsics, like _mm512_maskz_add_pd (__mmask8 k, __m512d a, __m512d b);
to do a zero-masking add, producing 0.0
where the mask was zero, and the normal result where the mask was one.
To count matches, _popcnt32(mask)
works; __mmask8
can implicitly convert to/from integer types. (In fact it's just a typedef for uint8_t
in existing implementations.)
But are you sure you want AVX-512 masking? You only tagged your question [avx], and that's a new feature in AVX-512. Without AVX-512, you'd use AVX1 _mm256_cmp_pd(a,b, _CMP_EQ_OQ)
to get a vector, and AVX1 _mm256_movemask_pd
on that to get an int
bitmap.
Or if you're doing this over multiple vectors, use integer subtraction to accumulate a count, like AVX2 counts = _mm256_sub_epi64(counts, cmp_result);
, then hsum that at the end. (A compare result vector has elements with all-0 or all-1 bits, i.e. integer 0 or -1). See