3

I am working with SIMD and am attempting to vectorize a loop. Here, I am trying to add a vector of indices to a pointer, left, in order to get the value of the pointer at that indice, and then continue to perform SIMD operations.

For example, if I was doing this without SIMD it would look like this:

x1 = left[a]

x2 = left[b]

x3 = left[c]

x4 = left[d]

where [a, b, c, d] is stored in the vector of indices (index_left_float)

float* left_Array[] = {left, left, left, left};

__m128 left_Array_simd = _mm_load_ps((float *) left_Array);

__m128 nleft = _mm_add_ps(index_left_float, left_Array_simd);

I also tried to load nleft into a new vector in order to get the values stored inside the pointer left at the indices of nleft but it would not let me.

The only thing I can think of would be to pull the indices from the vector, do this calculation normally, and then reload it to a vector, but this seems very costly and I am trying to optimize my code as much as possible. Any advice is appreciated! I've found the SIMD/SSE websites very hard to understand. Thanks!

Vivek Kumar Ray
  • 8,323
  • 5
  • 21
  • 28
Meow
  • 31
  • 2
  • 2
    It looks like you're trying to do a [gathered load](https://en.wikipedia.org/wiki/Gather-scatter_(vector_addressing)), which is not supported in SSE, but is supported in AVX2. However it's very inefficient and usually a sign that you're approaching vectorization from the wrong angle (or that your problem is inherently difficult/impossible to vectorize). Try posting the whole scalar loop and see if anyone can come up with an alternate strategy for vectorization. – Paul R Nov 18 '14 at 09:07
  • 2
    SIMD instructions feel comfortable with data stored at contiguous addresses, which explains why indirect addressing is not supported (was not supported, see @PaulR's comment). Think of the burden for the memory manager to collect data from several independent addresses simultaneously... –  Nov 18 '14 at 09:36
  • See also: http://stackoverflow.com/questions/19557746/what-is-the-fastest-way-to-do-a-simd-gather-with-avx2, http://stackoverflow.com/questions/23850431/sse2-how-to-load-data-from-non-contiguous-memory-locations/23907017#23907017, et al. – Paul R Nov 18 '14 at 10:10

0 Answers0