How would one transfer a (kind of) multidimensional array defined similar to an array "A" (i.e.
int********* A;
) of convert multidimensional array to single dimensional in C to CUDA GPU efficiently? Thanks!
How would one transfer a (kind of) multidimensional array defined similar to an array "A" (i.e.
int********* A;
) of convert multidimensional array to single dimensional in C to CUDA GPU efficiently? Thanks!
Since you've edited your question, I'll edit my response. Such an array (* *******A) is rather difficult to create. It requires nested loops with malloc, where the nesting level is equal to the array dimensionality. Having said that, the response is similar to what I have already posted below. Either you have a parallel set of nested loops that are doing the cudaMalloc and cudaMemcpy along the way, or else you linearize the whole thing and transfer in one step. For a two-dimensional array, I could possibly consider suggesting either approach. For an N-dimensional array, the first method is simply madness, as illustrated in this sequence of SO questions. Therefore, I think you should certainly linearize a large dimensional varying-row array before trying to transfer it to the device. The method of linearization is asked in the previous question you refer to and is outside of the scope of my answer here. Once linearized, the transfer operation is straightforward, and can be done with a single cudaMalloc/cudaMemcpy operation.
Presumably you are referring to arrays where the individual rows have different sizes (and are therefore malloc'ed independently). I think you have 2 choices:
In either case, you will have to carefully consider the access mechanism to make the array conveniently available on the GPU. The first method may be easier in this respect, since you will automatically have pointers for each row. For the second method, you may need to create a set of pointers on the device to match your row pointers on the host. Beyond that, your access mechanism on the device should be similar to the host, since either will use a set of row pointers to access your array.
If instead you are referring to the ordinary multidimensional array (a[dim1][dim2][dim3]...) that is straightforward since it is already all contiguous in memory and accessible with a single pointer. If you remake the original varying-rows array as an ordinary multidimensional array whose number of columns is equal to the longest row (therefore leaving some elements unused in other rows), you could take advantage of this technique instead. This will have some inefficiency because you are transferring unused elements, but accessing the array would be straightforward.
If you have truly sparse matrices, you might also want to consider sparse matrix representation methods. cusp would be one method for handling and manipulating these on the GPU.
This answer may also be of interest.