The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed.
I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.
On Intel, if I create a float vector of type __m256
, and reduce my size by a factor of 8, it gives me aligned memory.
E.g. std::vector<__m256> mvec_a((N*M)/8);
In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.
Instead, I would prefer to have an std::vector<float>
which is correctly aligned, and thus can be loaded into __m256
and other SIMD types without segfaulting.
I've been looking into aligned_alloc.
This can give me a C-style array that is correctly aligned:
auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));
However I'm unsure how to do this for std::vector<float>
. Giving the std::vector<float>
ownership of marr_a
doesn't seem to be possible.
I've seen some suggestions that I should write a custom allocator, but this seems like a lot of work, and perhaps with modern C++ there is a better way?