CUDA, dynamic array + array. malloc and copy

Question

So I have been stuck on this problem for a while. My struct looks like this:

typedef struct 
{
int size;
int dim[DIMENSIONS];
float *data;

}matrix;

Now the problem for me is how to malloc and memcpy. This is how I'm doing it:

matrix * d_in;
matrix * d_out;
const int THREADS_BYTES = sizeof(int) + sizeof(int)*DIMENSIONS + sizeof(float)*h_A->_size;
cudaMalloc((void **) &d_in, THREADS_BYTES);
cudaMemcpy(d_in, h_A, THREADS_BYTES, cudaMemcpyHostToDevice);

EDIT: this is how I allocated h_a:

 matrix  A; // = (matrix*)malloc(sizeof(matrix));
 A._dim[0] = 40;
 A._dim[1] = 60;
 A._size = A._dim[0]*A._dim[1];
 A._data = (float*)malloc(A._size*sizeof(float));
 matrix *h_A = &A;

Where h_A is a matrix I allocated. I call my kernel like this:

DeviceComp<<<gridSize, blockSize>>>(d_out, d_in);

However, in my kernel I cannot reach any data from the struct, only the array and the variable.

score 0 · Accepted Answer · edited May 23 '17 at 12:13

This is a common problem. When you did the malloc operation on the host (for h_a->data), you allocated host data, which is not accessible from the device.

This answer describes in some detail what is going on and how to fix it.

In your case, something like this should work:

matrix  A; // = (matrix*)malloc(sizeof(matrix));
A._dim[0] = 40;
A._dim[1] = 60;
A._size = A._dim[0]*A._dim[1];
A._data = (float*)malloc(A._size*sizeof(float));
matrix *h_A = &A; 

float *d_data;
cudaMalloc((void **) &d_data, A._size*sizeof(float));


matrix * d_in;
matrix * d_out;
const int THREADS_BYTES = sizeof(int) + sizeof(int)*DIMENSIONS + sizeof(float)*h_A->_size;
cudaMalloc((void **) &d_in, THREADS_BYTES);
cudaMemcpy(d_in, h_A, THREADS_BYTES, cudaMemcpyHostToDevice);

cudaMemcpy(&(d_in->data), &d_data, sizeof(float *), cudaMemcpyHostToDevice);

Note that this doesn't actually copy the data area from the host copy of A to the device copy. It simply makes a device-accessible data area, equal in size to the host data area. If you also want to copy the data area, that will require another cudaMemcpy operation, using h_a->data and d_data.

CUDA, dynamic array + array. malloc and copy

1 Answers1