1

This question is an extension of this question and related to this question.

[Q1] Do I need to cast to (void**) when doing cudaMalloc of a struct member? Example (Please see in code questions):

The structure:

typedef struct {
  int a;
  int *b;
} Matrix;

The main function for allocating and copying to device:

int main void() 
{
  int rows, cols, numMat = 2;

  //[Q2] What would be the problem of no allocating (numMat * sizeof()) here?
  //for example, allocating just for sizeof(Matrix)?
  Matrix *data = (Matrix*)malloc(numMat * sizeof(Matrix));

  // ... Successfully read from file into "data" ...
  //[Q3] Do we really need to copy "data" to host?
  //[A3] No necessary
  Matrix *h_data = (Matrix*)malloc(numMat * sizeof(Matrix));
  memcpy(h_data, data, numMat * sizeof(Matrix));

  // ... Copy matrix data is now on the gpu ...
  //[Q4] Do we need to cast (void**)&(h_data->a)? 'a' not a pointer.
  //[A4] An int cannot be copied in this fashion
  // cudaMalloc(&(h_data->a), rows*cols*sizeof(int));
  // cudaMemcpy(h_data->a, data->a, rows*cols*sizeof(int), cudaMemcpyHostToDevice);

  //[Q5] Do we need to cast (void**)&(h_data->b)? 'b' is a pointer
  cudaMalloc(&(h_data->b), rows*cols*sizeof(int));
  cudaMemcpy(h_data->b, data->b, rows*cols*sizeof(int), cudaMemcpyHostToDevice);

  // ... Copy the "meta" data to gpu ...
  //[Q6] Can we just copy h_data instead? Why creating another pointer "d_data"?
  //[A6] Yes
  Matrix *d_data;

  //[Q7] Wouldn't we need to cast (void**)&d_data?
  cudaMalloc(&d_data, numMat*sizeof(Matrix));

  //[Q8] h_data is in host and device. Can we just copy "data" to device?
  cudaMemcpy(d_data, h_data, numMat*sizeof(Matrix));
  // ... Do other things ...
}

Ultimately, we would just want to pass Matrix as a pointer:

// Kernel call
doThings<<<dimGrid, dimBlock>>>(d_data);

The kernel definition:

__global__ doThings(Matrix *matrices)
{
  matrices->a = ...;
  matrices->b = ...;
}

Thanks in advance for the time and work in helping me on my doubts!

Community
  • 1
  • 1
mrei
  • 121
  • 14
  • Copying structures with embedded pointers is somewhat involved. Your code appears to be missing at least one step conceptually. You might want to review my answer [here](http://stackoverflow.com/questions/15431365/cudamemcpy-segmentation-fault/15435592#15435592), which I'm tempted to mark as a duplicate. For your question 3, there is no difference in the type of storage between `data` and `h_data`. There is no need to allocate both and copy from one to the other, they are both on the "host". – Robert Crovella Feb 27 '14 at 20:28
  • @RobertCrovella Thanks, your examples on the link are very useful. I think I know what step you're talking about, the for-loop for more than one Matrix perhaps? Actually, the original example has it but in my implementation I won't have more than one Matrix struct. – mrei Feb 27 '14 at 23:45
  • There are several things wrong with your code. You cannot do a `cudaMalloc` on `h_data->a`. We do `cudaMalloc` on *pointers*, by taking the address of the pointer, and passing it to `cudaMalloc`. `h_data->a` is not a pointer. It is an `int`. The step I was referring to was that somewhere along the way you have to copy a *pointer* that has been allocated by `cudaMalloc` to the device copy of the pointer `b`, i.e. d_data->b. Step 5 in the answer I linked. I think there may be other defects in your code as well. – Robert Crovella Feb 27 '14 at 23:56

0 Answers0