1

I'm trying to vectorize the following program:

for(i=0;i<N;i++)
{
    a = arr[i];
    //arithmetic on *a* here.
    count[a]++;
}

Using intrinsics this becomes something like:

for(i=0;i<N;i+=8)
{
    __m512i a = _mm512_loadu_epi64(arr+i);
    //arithmetic on *a* here.
    __m512i gather_a = _mm512_i64gather_epi64(a,cnt,8);
    int64_t val1 = 1;
    temp = _mm_cvtsi64_si128(val1);
    __m512i one   = _mm512_broadcastq_epi64(temp);
    __m512i added  = _mm512_add_epi64(gather_a, one); //count[a]++;
    _mm512_i64scatter_epi64(count,a,added,8);
}

The problem is that the vectorized version's results in the output count array seems to be slightly off here and there. Is this problem related to the atomicity of AVX gather/scatter intrinsics or is this some other problem related to aliasing on the array?

Thanks

mas
  • 345
  • 5
  • 18
  • 2
    Did you check for conflicts between indices? `vpconflictq` exists for that purpose. If that's possible, it's not a problem of [per-element atomicity for scatter stores](https://stackoverflow.com/questions/46012574/per-element-atomicity-of-vector-load-store-and-gather-scatter), it's that vector gather / vector increment / vector scatter isn't the same as incrementing one element at a time. – Peter Cordes Oct 19 '21 at 03:47
  • Also, you don't need to use `_mm_cvtsi64_si128(1)` and so on; the normal way to write broadcast constants is `__m512i one = _mm512_set1_epi64( 1 );`. – Peter Cordes Oct 19 '21 at 03:48
  • I guess you could call conflicts (same bucket twice in the vector) a form of aliasing, so yes, aliasing. – Peter Cordes Oct 19 '21 at 03:56

0 Answers0