I am parallelizing a certain dynamic programming problem using AVX2/SSE instructions.
In the main iteration of my calculation, I calculate column in matrix where each cell is a structure of AVX2 registers (_m256i
). I use values from the previous matrix column as input values for calculating the current column. Columns can be big, so what I do is I have an array of structures (on stack), where each structure has two _m256i
elements.
Structure:
struct Cell {
_m256i first;
_m256i second;
};
An then I have array like this: Cell prevColumn [N]
. N
will tipically be few hundreds.
I know that _m256i
basically represents an avx2 register, so I am wondering how should I think about this array, how does it behave, since N is much larger than 16 (which is number of avx registers)? Is it a good practice to create such an array, or is there some better approach that i should use when storing a lot of _m256i
values that are going to be reused real soon?
Also, is there any aligning I should be doing with this structures? I read a lot about aligning, but I am still not sure how and when to do it exactly.