There is movdqu
available via _mm_loadu_si128
that requires SSE2.
There is vmovdqu8
(16, 32, 64) available via _mm_loadu_epi8
(16, 32, 64) available via AVX512BW + AVX512VL or AVX512F + AVX512VL.
What is the purpose of the later if they apparently do the same?
If the purpose is the mask, then why are unmasked _mm_loadu_epi8
exposed as intrinsics?