I have the following code which I compiled with g++ 6.1 on windows using the MinGW compiler.
unsigned char test_uc[8] = {0, 0xaa, 0xbb, 0xcc, 0x11, 0x22, 0x33, 0x44};
uint64_t* p64 = (uint64_t*)test_uc;
__m256i res = _mm256_cvtepu8_epi32 (_mm_cvtsi64_si128(*p64));
uint32_t* u32 = (uint32_t*)&res;
for(int i = 0; i < 8; i++)
printf("%d.0x%x\n", i, u32[i]);
When I run the program with optimization level -O1, I get the expected output as shown below.
0.0x0
1.0xaa
2.0xbb
3.0xcc
4.0x11
5.0x22
6.0x33
7.0x44
However, if I switch to optimization level -O3, I get this strange output.
0.0x0
1.0x0
2.0x0
3.0x0
4.0x8
5.0x0
6.0x41027f
7.0x0
What is going on here?