CUDA: Is it better to use M[][] for 2D static arrays or flatten them to M[]?

Question

Is it better to use the double pointer syntax M[][] to define a matrix and then access the elements like

for (int i=0; i<height; ++i)
{
    for (int j=0; j<width; ++j)
    {
        // do something with M[i][j]
    }
}

or flatten the matrix in a vector, define it as M[] and then access its elements like

for (int i=0; i<height; ++i)
{
    for (int j=0; j<width; ++j)
    {
        int index = j + i * height;
        // do something with M[index]
    }        
}

The first way is definitely better because the second way `i * height` is wrong. It should be `i * width`. — Steve Jessop, Nov 03 '13 at 23:14
http://stackoverflow.com/questions/11986632/how-to-find-out-which-nested-for-loop-is-better/11986732#11986732 — titus, Nov 03 '13 at 23:18
it should be the same, if you write it properly; although I don't know details for CUDA — titus, Nov 03 '13 at 23:19
you should save `i * height` and not compute it every iteration; maybe compiler is smart enough to do this — titus, Nov 03 '13 at 23:23

score 4 · Accepted Answer · edited May 23 '17 at 12:13

Without getting into a discussion about the exact index calculation (which you will fix any bugs when it comes time to implement) and discussions about how to optimize the index calculation (which a good compiler will do for you), I personally would prefer the second approach with CUDA.

The reason is that if I wanted to transfer data of this form back and forth between device and host, the second form is quite a bit easier.

There are many questions which explain why, so I'll not go into detail here, just search on "CUDA 2D array" in the upper right hand corner, and you'll see the complexity associated with transferring a double pointer (i.e. ** or [][] ) array between device and host in CUDA. Here is one example, take a look at the answer given by talonmies.

score 2 · Answer 2 · answered Nov 04 '13 at 00:34

Robert Crovella already told you that the second approach makes copying between host and device easier, but there is another reason why using a one-dimensional array is better: Since in the first version you need to dereference two pointers, you will get an additional memory read. Especially on the GPU, memory reads are signifficantly slower than calculating the index like in the second version. Thus I would use the approach with the one-dimensional array, even if the matrix is not copied between host and device.

CUDA: Is it better to use M[][] for 2D static arrays or flatten them to M[]?

2 Answers2

Linked