CUDA Copying multiple arrays of structs with cudaMemcpy

Question

Suppose a struct X with some primitives and an array of Y structs:

typedef struct 
{ 
   int a;    
   Y** y;
} X;

An instance X1 of X is initialized at the host, and then copied to an instance X2 of X, on the device memory, through cudaMemcpy.

This works fine for all the primitives in X (such as int a), but cudaMemcpy seems to flatten any double pointer into a single pointer, thus causing out of bounds exceptions wherever there's an access to the struct arrays in X (such as y).

In this case am I supposed to use another memcpy function, such as cudaMemcpy2D or cudaMemcpyArrayToArray?

Suggestions are much appreciated. Thanks!

edit

The natural approach (as in "that's what I'd do if it were just C) towards copying an array of structures would be to cudaMalloc the array and then cudaMalloc and initialize each element separately, e.g.:

X** h_x;
X** d_x;
int num_x;

cudaMalloc((void**)&d_x, sizeof(X)*num_x);

int i=0;
for(;i<num_x;i++)
{
    cudaMalloc((void**)d_x[i], sizeof(X));
    cudaMemcpy(&d_x[i], &h_x[i], sizeof(X), cudaMemcpyHostToDevice);
}

However, the for's cudaMalloc generates a crash. I confess I'm not yet comfortable with the usage of pointers in Cuda functions, so perhaps I screwed up with the cudaMalloc and cudaMemcpy parameters?

CUDA compute capability 2.0 and above support double percision operations, Otherwise, the compiler would cast double to float, Please note that the compilation would go without no errors — TripleS, Jul 22 '13 at 09:02
I told you that double pointers (`**`) makes this extra challenging. If you want to see how to copy `**` arrays from host to device, look [here](http://stackoverflow.com/questions/6137218/cuda-2d-array-problem/6137517#6137517). It's not for the faint of heart. Note that a.lasram is suggesting flattening *on the host* first. I also suggest you accept the answer given by a.lasram, and post new questions if you have them. It makes the question messy and confusing for others to read when you make wholesale edits and post mostly new questions in your old one that's already been answered. — Robert Crovella, Jul 23 '13 at 03:48

score 4 · Accepted Answer · edited Jul 22 '13 at 00:48

4

cudaMemcpy, cudaMemcpy2D and cudaMemcpyArrayToArray all copy from a contiguous memory region in the host to a contiguous memory region on the device.

You have to copy all your data in an intermediary contiguous buffer you send to the device.

edited Jul 22 '13 at 00:48

Robert Crovella

143,785
11
213
257

answered Jul 22 '13 at 00:41

a.lasram

4,371
1
16
24

For additional discussion you may be interested in my answer [here](http://stackoverflow.com/questions/15431365/cudamemcpy-segmentation-fault/15435592#15435592). The double pointer (**) is even more challenging. It's recommended that you flatten your data into a contiguous area, as a.lasram is suggesting. – Robert Crovella Jul 22 '13 at 00:51
Thanks for the answer. By flattening the data, do you mean something like serializing the entire struct into a byte array? – Leo Brito Jul 22 '13 at 17:53
@brito yes, serializing the entire struct into a contiguous byte array – a.lasram Jul 22 '13 at 18:26
@a.lasram, is there a way of copying the entire struct using only cudaMallocs and cudaMemcpys though? I'm reluctant towards serialization due to the complexity of the structures involved. I edited the original post with the code I'm trying to work out. Thanks. – Leo Brito Jul 23 '13 at 00:57
@brito maybe you're misinterpreting X**, X* and X and that's what causes the crash. note that cudaMalloc((void**)&d_x, sizeof(X)*num_x) allocates n times X* pointers where n=(sizeof(X)*num_x)/sizeof(X*). It's not a good idea to keep the same memory structure as the host. The device DRAM is optimized for high bandwidth while latency could be slow and caches are optimized for high reuse within a same warp. Scattering the data will hurt performance. – a.lasram Jul 23 '13 at 01:23

CUDA Copying multiple arrays of structs with cudaMemcpy

1 Answers1