When writing vectorized code, you sometimes want to perform memory-aligned operations.
So let's say I have an unsigned char[]
that ends before a 16-byte boundary, but I want to load the entire 16-byte-aligned block at once (and then presumably mask off the data I didn't need).
What is the proper way to do this across Clang, MSVC, and GCC without triggering undefined behavior?
(Let's not assume any particular vector types or operations - I'm hoping for approaches that work equally well for unaligned __m256i
as they do for unaligned unsigned int
.)
This is something that's safely possible in asm, that libc functions like strlen
already do. But I'm asking about how to do it on these 3 mainstream compilers. It's illegal at the language level, at least according to the ISO standard, but do they have extensions or define the behaviour of doing this?