I am learning SIMD instructions. I tried to implement a vector dot product using avx intrinsic, but to my astonishment, I found that the alternate vectors in 256-bit vector collection are zeros
I tried to write a short code reproducing the error. I am a beginner in avx intrinsics. Please could you guide me, as to where I am making the mistake?
#include <iostream>
#include <immintrin.h>
#define ALIGN 64 //cache size
using namespace std;
int main()
{
int* a= (int*) aligned_alloc(ALIGN, sizeof(int)*8);
int* b= (int*) aligned_alloc(ALIGN, sizeof(int)*8);
a[0]=103; a[1]=198; a[2]= 105; a[3]=115; a[4]=81; a[5]=255; a[6]=74; a[7]=236;
b[0]=8; b[1]=172; b[2]=163; b[3]=32; b[4]=62; b[5]=247; b[6]= 73; b[7]=132;
__m256i* A=(__m256i*)a;
__m256i* B=(__m256i*)b;
__m256i temp=_mm256_mul_epi32(A[0],B[0]);
int* ptr=(int*)(&temp);
cout<<ptr[0]<<" "<<ptr[1]<<" "<<ptr[2]<<" "<<ptr[3]<<" "<<ptr[4]<<" "<<ptr[5]<<" "<<ptr[6]<<" "<<ptr[7]<<endl;
}
Output:
abhishek@abhishek:~$ ./test
824 0 17115 0 5022 0 5402 0
I have no clue as to why the alternate elements are zero.