Disclamer: I'm a simd newbie, so if this filthy peasant asks some bad questions.
From my understanding, AVX-512 architectures can process up to 16 float variables all together, while AVX2 "only" 4.
In order to take advantage of this, the data has to be aligned. As I found out here, this can be done with:
For AVX-512:
alignas(32) float a[8];
For AVX2:
alignas(16) float a[8];
Ok, so my first question is: since 16 is a factor of 32, why don't we always use alignas(32)
also for AVX2 architectures? Maybe (probably) I'm missing something.
Then, I have this function:
bool interpolate(const Mat &im, Mat &res, /*...*/){/*...*/}
Which are allocated with:
cv::Mat im(r, c, CV_32FC1); //similarly res
The Intel compiler report tells me that these two matrices are not aligned. So my second question is: how can I allocate them so they are 16/32 aligned? I could allocate an aligned pointer and the pass it to cv::Mat
constructor something like:
float *aligned_ptr = /*allocate r*c 16/32 aligned floating points*/
cv::Mat m (r, c, CV_32FC1, /* use aligned_ptr somehow*/);