I am looking for an intrinsic function that can take the 8 32-bit integers in an avx2 register and store them each at their own index in an array (essentially the store-equivalent to _mm256_i32gather_epi32). As far as I can tell, such a function doesn't exist, but I'm not sure if I'm just missing something as I am new to SIMD programming.
Asked
Active
Viewed 236 times
0
-
2The store equivalent of `gather` is `scatter`, but you need AVX512 for that: https://www.felixcloutier.com/x86/vpscatterdd:vpscatterdq:vpscatterqd:vpscatterqq – chtz Jun 09 '22 at 14:40
-
Related: [What do you do without fast gather and scatter in AVX2 instructions?](https://stackoverflow.com/q/51128005) for qword gather/scatter, with details about a specific use-case that let me compare a purely scalar strategy. – Peter Cordes Jun 09 '22 at 18:18
1 Answers
1
You’re correct, that instruction doesn’t exist in AVX2. Here’s one possible workaround. But note that will compile into quite a few instructions. If you can, do something else instead.
// Store 4 integers from SSE vector using offsets from another vector
inline void scatter( int* rdi, __m128i idx, __m128i data )
{
rdi[ (uint32_t)_mm_cvtsi128_si32( idx ) ] = _mm_cvtsi128_si32( data );
rdi[ (uint32_t)_mm_extract_epi32( idx, 1 ) ] = _mm_extract_epi32( data, 1 );
rdi[ (uint32_t)_mm_extract_epi32( idx, 2 ) ] = _mm_extract_epi32( data, 2 );
rdi[ (uint32_t)_mm_extract_epi32( idx, 3 ) ] = _mm_extract_epi32( data, 3 );
}
// Store 8 integers from AVX vector using offsets from another vector
inline void scatter( int* rdi, __m256i idx, __m256i data )
{
scatter( rdi, _mm256_castsi256_si128( idx ), _mm256_castsi256_si128( data ) );
scatter( rdi, _mm256_extracti128_si256( idx, 1 ), _mm256_extracti128_si256( data, 1 ) );
}

Soonts
- 20,079
- 9
- 57
- 130