I have some problems with AVX2 instructions. I wrote a program in c which read a binary file with unsigned chars then sum them. Now i want to replace the c for loop with AVX2 instructions but it doesnt work. Thats the first time i want to use AVX2. I know the bottleneck is the I/O operation but still want to try out the AVX.
Currently this is the part of my AVX code:
__m256i v1;
__m256i v2;
__m256i sum;
for(j = 0; j < filelen/2; j += 64)
{
v1 = _mm256_loadu_si256((__m256i *)&(t[j]));
v2 = _mm256_loadu_si256((__m256i *)&(t[j+32]));
sum += _mm256_add_epi8(v1, v2);
}
unsigned int *storege = (unsigned int*)∑
I have the data in the array "t", about 500K unsigned char. My idea was to sum the first 32 element with the following 32 in one step but my code doesnt work.