0

I have some problems with AVX2 instructions. I wrote a program in c which read a binary file with unsigned chars then sum them. Now i want to replace the c for loop with AVX2 instructions but it doesnt work. Thats the first time i want to use AVX2. I know the bottleneck is the I/O operation but still want to try out the AVX.

Currently this is the part of my AVX code:

    __m256i v1;
    __m256i v2;
    __m256i sum;

    for(j = 0; j < filelen/2; j += 64)
    {
    v1 = _mm256_loadu_si256((__m256i *)&(t[j]));
    v2 = _mm256_loadu_si256((__m256i *)&(t[j+32]));
    sum += _mm256_add_epi8(v1, v2);
    }
    unsigned int *storege = (unsigned int*)&sum;

I have the data in the array "t", about 500K unsigned char. My idea was to sum the first 32 element with the following 32 in one step but my code doesnt work.

AsdFork
  • 15
  • 5
  • You can't add 8 bit values like that - the values will quickly overflow. See [this question](https://stackoverflow.com/q/10930595/253056) and [this question](https://stackoverflow.com/q/10932550/253056) for how to do it in SSE, and convert the code to AVX if you feel you really need to. – Paul R Nov 07 '17 at 16:24
  • 1
    What's with all these AVX2 horizontal sum over disk questions lately? Is there an assignment going on? – Mysticial Nov 07 '17 at 16:25
  • @Mysticial: either that, or RafaNadal95 and AsdFork are the same person. – Paul R Nov 07 '17 at 16:31
  • 1
    AVX2 vpsadbw + vpaddq in the loop and then a horizontal sum at the end is what you want. – Peter Cordes Nov 07 '17 at 17:03

0 Answers0