Trying to understand alignment in relation to SIMD load operations I am slighly confused by the output of the following example code:
double vec[4] = { 2.6, 0.0, 0.0, 0.0 };
auto reg = _mm256_load_pd(&vec[0]);
Here I define and initialize some array vec
of four doubles
, which results in automatic alignment of 8 bytes, which makes sense, since the size of double
is 8 bytes. Printing addresses &vec[i]
of the vec
values outputs values which have 8 byte alignment:
0x000000E05D4FF968
0x000000E05D4FF970
0x000000E05D4FF978
0x000000E05D4FF980
So far, so good. However the intrinsic function _mm256_load_pd()
, which I call next, expects an address which is 256 bit (32 byte) aligned. The address of the first array element 0x000000E05D4FF968
is not divisible by 32 (0x000000E05D4FF968 % 32 = 8
) but the code runs with no problems.
So my main question is: how is this possible?
UPDATE:
Here is a minimal reproducible example. Of course the values of addresses will be different every time you run it. But it's not difficult to catch the case where the 1st address is not 32 byte aligned.
#include <iostream>
#include <immintrin.h>
int main()
{
double vec[4] = { 2.6, 0.0, 0.0, 0.0 };
auto reg = _mm256_load_pd(&vec[0]);
for (int i = 0; i < 4; ++i)
{
std::cout << &vec[i] << std::endl;
}
return 0;
}