1

Using C++ intrinsics is it possible to load sixteen (or eight if sixteen not possible) bytes from memory to an SIMD register, so each byte is now a 32-bit integer element?

user997112
  • 29,025
  • 43
  • 182
  • 361
  • You can load 16 bytes from memory address Z to a _m128i type using X = _mm_load_si(Z); The data must be aligned on a 16 byte boundary. The calls you make subsequently determine how the data is to be treated. Of course 32-bit integers would be 4 bytes. You can't control what goes in the SIMD registers directly in c. – Simon Goater Feb 06 '23 at 12:05
  • You can can use _mm_lddqu_si128 (in SSE3) instead for unaligned data. Your question has tags for avx and avx512 which use 32 and 64byte registers respectively. Each use their own intrinsics but are not as widely supported in hardware as the 16byte SSE intrinsics. It's quite common to assume at least SSE2 support. – Simon Goater Feb 06 '23 at 12:22
  • 2
    Look for `_mm512_cvtepi8_epi32` or `_mm512_cvtepu8_epi32` -- depending on whether you want sign extension or zero extension. – chtz Feb 06 '23 at 12:31

0 Answers0