I am manually optimizing some code using AVX instructions. At some point I want to collect some floats from an (unaligned) array with _mm_i32gather_ps() because they lie at random positions (not contiguous).
Nevertheless I do not get the values I expect. I checked the index (which is correct), and even when I hard code the values in the index, like
idx = _mm256_set_epi32(100,101,102,103,104,105,106,107);
values = _mm256_i32gather_ps(array,idx,1);
I do not get the expected values.
As far as I know, it is not necessary for the array to be aligned, nor the accesses to the array. Does anyone see what I might be doing wrong?
Thanks