Why are the alternate elements in a vector in the output of _mm256_mul_epi32 avx intrinsic instruction zero?

Question

I am learning SIMD instructions. I tried to implement a vector dot product using avx intrinsic, but to my astonishment, I found that the alternate vectors in 256-bit vector collection are zeros

I tried to write a short code reproducing the error. I am a beginner in avx intrinsics. Please could you guide me, as to where I am making the mistake?

#include <iostream>
#include <immintrin.h>
#define ALIGN 64 //cache size
using namespace std;
int main()
{
        int* a= (int*) aligned_alloc(ALIGN, sizeof(int)*8);
        int* b= (int*) aligned_alloc(ALIGN, sizeof(int)*8);
        
        a[0]=103; a[1]=198; a[2]= 105; a[3]=115; a[4]=81; a[5]=255; a[6]=74; a[7]=236;
        b[0]=8; b[1]=172; b[2]=163; b[3]=32; b[4]=62; b[5]=247; b[6]= 73; b[7]=132;

        __m256i* A=(__m256i*)a;
        __m256i* B=(__m256i*)b;
        
        __m256i temp=_mm256_mul_epi32(A[0],B[0]);
        
        int* ptr=(int*)(&temp);
        cout<<ptr[0]<<" "<<ptr[1]<<" "<<ptr[2]<<" "<<ptr[3]<<" "<<ptr[4]<<" "<<ptr[5]<<" "<<ptr[6]<<" "<<ptr[7]<<endl;

}

Output:

abhishek@abhishek:~$ ./test
824 0 17115 0 5022 0 5402 0

I have no clue as to why the alternate elements are zero.

`_mm_mul_epi32` is SSE2 widening multiply (`pmuldq`), 32x32 => 64-bit; check the docs, https://www.felixcloutier.com/x86/pmuldq or the intrinsics guide. You're looking for the AVX2 version of `_mm_mullo_epi32`, SSE4.1 `pmulld` packed 32x32 => 32-bit. — Peter Cordes, Nov 23 '22 at 12:03
Also, that's not a safe way to access vector elements, unless you compile with MSVC, or `gcc -fno-strict-aliasing`. (But don't do that, see [print a \_\_m128i variable](https://stackoverflow.com/a/46752535) for a safe way.) Also use `_mm256_load_si256( (__m256i*)a )` instead of a pointer cast. You don't need dynamic storage for this, you could have done `alignas(32) int a[] = {103, 198, ...};` — Peter Cordes, Nov 23 '22 at 12:06

Why are the alternate elements in a vector in the output of _mm256_mul_epi32 avx intrinsic instruction zero?

0 Answers0