I've been searching for a while now, but can't seem to find anything useful in the documentation or on SO. This question didn't really help me out, since it makes references to modifying the assembly and I am writing in C.
I have some code making indirect accesses that I want to vectorize.
for (i = 0; i < LENGTH; ++i) {
foo[bar[i]] *= 2;
}
Since I have the indices I want to double inside bar
, I was wondering if there was a way to load those indices of foo
into a vector register and then I could apply my math and store it back to the same indices.
Something like the following. The load
and store
instructions I just made up because I couldn't find anything like them in the AVX or SSE documentation. I think I read somewhere that AVX2 has similar functions, but the processor I'm working with doesn't support AVX2.
for (i = 0; i < LENGTH; i += 8) {
// For simplicity, I'm leaving out any pointer type casting
__m256 ymm0 = _mm256_load_indirect(bar+i);
__m256 ymm1 = _mm256_set1_epi32(2); // Set up vector of just 2's
__m256 ymm2 = _mm256_mul_ps(ymm0, ymm1);
_mm256_store_indirect(ymm2, bar+i);
}
Are there any instructions in AVX or SSE that will allow me to load a vector register with an array of indices from a different array? Or any "hacky" ways around it if there isn't an explicit function?