I'm trying to find a more efficient way to "rotate" or shift the 32 bit floating point values within an avx _m256 vector to the right or left by one place.
Such that:
a7, a6, a5, a4, a3, a2, a1, a0
becomes
0, a7, a6, a5, a4, a3, a2, a1
(I dont mind if the data gets lost as I replace the cell anyway.)
I've already taken a look at this thread: Emulating shifts on 32 bytes with AVX but I don't really understand what is going on, and it doesn't explained what the _MM_SHUFFLE(0, 0, 3, 0) does as an input parameter.
I'm trying to optimise this code:
_mm256_store_ps(temp, array[POS(ii, jj)]);
_mm256_store_ps(left, array[POS(ii, jj-1)]);
tmp_array[POS(ii, jj)] = _mm256_set_ps(left[0], temp[7], temp[6], temp[5], temp[4], temp[3], temp[2], temp[1]);
I know once a shift is in place, I can use an insert to replace the remaining cell. I feel this will be more efficient then unpacking into a float[8] array and reconstructing.
-- I'd also like to be able to shift both left and right, as I need to perform a similar operation elsewhere.
Any help is greatly appreciated! Thanks!