5

I am manually optimizing some code using AVX instructions. At some point I want to collect some floats from an (unaligned) array with _mm_i32gather_ps() because they lie at random positions (not contiguous).

Nevertheless I do not get the values I expect. I checked the index (which is correct), and even when I hard code the values in the index, like

idx = _mm256_set_epi32(100,101,102,103,104,105,106,107);
values = _mm256_i32gather_ps(array,idx,1);

I do not get the expected values.

As far as I know, it is not necessary for the array to be aligned, nor the accesses to the array. Does anyone see what I might be doing wrong?

Thanks

Z boson
  • 32,619
  • 11
  • 123
  • 226
gramuc
  • 85
  • 6
  • 6
    You probably need a scale of 4, not 1. – Jester Oct 08 '15 at 13:56
  • 3
    As @Jester says, the indices are *byte* offsets, so you need to multiply them by `sizeof(float)`, i.e. pass 4 for `scale`. See [this question](http://stackoverflow.com/questions/16193434/avx2-gather-instructions-load-address-calculation). – Paul R Oct 08 '15 at 15:22
  • Perfect answer. thank you! – gramuc Oct 09 '15 at 11:00
  • This answer just saved me some time and grief. I totally overlooked _"scale should be 1, 2, 4 or 8"_ in Intel's documentation, and instead paid attention to their (pseudo?)code `dst[i+31:i] := MEM[base_addr + SignExtend(vindex[i+31:i])*scale]`, which is very easily misleading as it lacks `*8` after `scale` and **all** of the other array accessors in that statement use _bit_ offsets. As a result of my misunderstanding, I tried to use `32` for the scale argument, but it produced this error in MSVC: "illegal argument to intrinsic function, parameter 2". Using `4` instead works perfectly. – M-Pixel Oct 02 '18 at 20:45

0 Answers0