1

Disclamer: I'm a simd newbie, so if this filthy peasant asks some bad questions.

From my understanding, AVX-512 architectures can process up to 16 float variables all together, while AVX2 "only" 4.

In order to take advantage of this, the data has to be aligned. As I found out here, this can be done with:

For AVX-512:

alignas(32) float a[8];

For AVX2:

alignas(16) float a[8];

Ok, so my first question is: since 16 is a factor of 32, why don't we always use alignas(32) also for AVX2 architectures? Maybe (probably) I'm missing something.

Then, I have this function:

bool interpolate(const Mat &im, Mat &res, /*...*/){/*...*/}

Which are allocated with:

cv::Mat im(r, c, CV_32FC1); //similarly res

The Intel compiler report tells me that these two matrices are not aligned. So my second question is: how can I allocate them so they are 16/32 aligned? I could allocate an aligned pointer and the pass it to cv::Mat constructor something like:

 float *aligned_ptr = /*allocate r*c 16/32 aligned floating points*/
 cv::Mat m (r, c, CV_32FC1, /* use aligned_ptr somehow*/);
Community
  • 1
  • 1
  • "AVX-512 architectures can process up to 16 float variables all together, while AVX2 "only" 4." - that's only if you want to apply the same operation to all of them, though, so with a traditional matrix layout, it doesn't make sense to handle all of them at once. You don't want to multiply every value by 10 very often. It's called SIMD (same instruction multiple data) for this reason. – xaxxon May 01 '17 at 13:36
  • You can use placement new to specify the memory for the constructor to construct the object in. new (buffer) vm::mat(...constructor params...); I think there is something about having to explicitly call the destructor when you do this? not sure, make sure to read up on placement new to make sure you're doing it right. Also, be wary of doing array allocations with placement new and aligned address, as it may have additional overhead for storing the size of the array. – xaxxon May 01 '17 at 13:36
  • 1
    If you really want to learn about SIMD, I recommend looking at and learning the intel intrinsics functions. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmp&expand=663 They're basically C wrappers of the actual SIMD machine instructions - meaning you don't have to use inline asm to get the results you want. ##opengl on freenode is a good resource for all the things you're asking about recently. matrix stuff and optimization techniques are considered reasonably on topic there. – xaxxon May 01 '17 at 13:39
  • if you need to do massive amounts of matrix operations, to use SIMD for significant speedups, you have to have your data in "struct of array" format, not "array of struct" so you can load the "top right" value of 16 different matrices from sequential data and apply desired change to all of them with a single SIMD operation. Repeat for each element of the matrix. All the above comments are from my VERY limited understanding of how this stuff works. – xaxxon May 01 '17 at 13:47

0 Answers0