AVX2 - storing integers at arbitrary indices in an array

Question

I am looking for an intrinsic function that can take the 8 32-bit integers in an avx2 register and store them each at their own index in an array (essentially the store-equivalent to _mm256_i32gather_epi32). As far as I can tell, such a function doesn't exist, but I'm not sure if I'm just missing something as I am new to SIMD programming.

The store equivalent of `gather` is `scatter`, but you need AVX512 for that: https://www.felixcloutier.com/x86/vpscatterdd:vpscatterdq:vpscatterqd:vpscatterqq — chtz, Jun 09 '22 at 14:40
Related: [What do you do without fast gather and scatter in AVX2 instructions?](https://stackoverflow.com/q/51128005) for qword gather/scatter, with details about a specific use-case that let me compare a purely scalar strategy. — Peter Cordes, Jun 09 '22 at 18:18

score 1 · Accepted Answer · answered Jun 09 '22 at 15:00

You’re correct, that instruction doesn’t exist in AVX2. Here’s one possible workaround. But note that will compile into quite a few instructions. If you can, do something else instead.

// Store 4 integers from SSE vector using offsets from another vector
inline void scatter( int* rdi, __m128i idx, __m128i data )
{
    rdi[ (uint32_t)_mm_cvtsi128_si32( idx ) ] = _mm_cvtsi128_si32( data );
    rdi[ (uint32_t)_mm_extract_epi32( idx, 1 ) ] = _mm_extract_epi32( data, 1 );
    rdi[ (uint32_t)_mm_extract_epi32( idx, 2 ) ] = _mm_extract_epi32( data, 2 );
    rdi[ (uint32_t)_mm_extract_epi32( idx, 3 ) ] = _mm_extract_epi32( data, 3 );
}

// Store 8 integers from AVX vector using offsets from another vector
inline void scatter( int* rdi, __m256i idx, __m256i data )
{
    scatter( rdi, _mm256_castsi256_si128( idx ), _mm256_castsi256_si128( data ) );
    scatter( rdi, _mm256_extracti128_si256( idx, 1 ), _mm256_extracti128_si256( data, 1 ) );
}

AVX2 - storing integers at arbitrary indices in an array

1 Answers1