What is the "correct" (i.e., portable) way in LLVM to load data from memory into a SIMD vector?
Looking at the typical IR generated by LLVM's auto-vectorizer for an x86 target, it seems like the pattern is:
- bitcast a pointer to the scalar type (e.g.,
double *
) to the corresponding vector type (e.g.,<4 x double>*
), - load from the converted pointer while taking into account alignment considerations (i.e., don't use the natural alignment of the vector type, but the alignment of the corresponding scalar type).
In the case of AVX, this pattern maps nicely to SIMD intrinsics such as _mm256_loadu_pd()
and friends. However, I have no idea if this strategy would also be correct for other ISAs (e.g., Neon, AltiVec).
I haven't been able to find info on the topic in the LLVM docs. Am I missing something obvious?