CUDA Copy inherited class object to device

Question

I have a Parent class and an inherited Child class:

class Parent {};
class Child : public Parent {};

There are a couple child classes that inherit from Parent, but for simplicity, I only included one. These inherited classes are necessary for the project I am working on. I also have an object from another class, which I wish to copy onto the device:

class CopyClass {
  public:
    Parent ** par;
};

Note that the Parent ** par; is there because I need to have a list of Child objects, but which child it will be using (and the length of the list) is unknown at compile time. Here is my attempt at copying a CopyClass object onto the device:

int length = 5;

//Instantiate object on the CPU
CopyClass cpuClass;
cpuClass.par = new Parent*[length];
for(int i = 0; i < length; ++i) cpuClass.par[i] = new Child;

//Copy object onto GPU
CopyClass * gpuClass;
cudaMalloc(&gpuClass,sizeof(CopyClass));
cudaMemcpy(gpuClass,&cpuClass,sizeof(CopyClass),cudaMemcpyHostToDevice);

//Copy dynamically allocated variables to GPU
Parent ** d_par;
d_par = new Parent*[length];
for(int i = 0; i < length; ++i) {
    cudaMalloc(&d_par[i],sizeof(Child));
    printf("\tCopying data\n");
    cudaMemcpy(d_par[i],cpuClass.par[i],sizeof(Child),cudaMemcpyHostToDevice);
}

//SIGSEGV returned during following operation
cudaMemcpy(gpuClass->par,d_par,length*sizeof(void*),cudaMemcpyHostToDevice);

I have seen multiple similar problems to this here, here, here, here, and here, but either I couldnt understand the problem they were having, or it didn't seem to fit in with this particular issue.

I know that the segmentation fault I am getting is because gpuClass->par is on the device, and cudaMemCpy does not allow device pointers. However, I see no other way to "insert" the pointer into the gpuClass object.

The ways which I could see a solution is to:

1) Flatten my data structure. However, I don't know how to do this with the inherited class functionality that I want.

2) Instantiate gpuClass originally on the gpu, which I don't know how to do, or

3) I have seen in one of the solutions that you can use cudaMemCpy to copy the address of your dynamically allocated list into an object, but once again, I don't know how to do that (specifically for copying a device pointer to the location of another device pointer).

Any help would be greatly appreciated.

The answer is exactly the same as the other questions you linked to. Build a copy of the device structure in host memory first and then copy that to the device — talonmies, Nov 18 '16 at 17:06
I believe that is what I've done in the code above. I believe I have to first copy the class onto the device, then create the dynamically allocated variables on the host and copy those onto the device. However, I don't know how to copy this dynamic variable _into the object_ on the gpu. — Simon Ewing, Nov 18 '16 at 17:59
You are not doing that. gpuClass would need to be allocated in host memory, assigned the device pointer to the already assembled array of device pointers, then copied to the device — talonmies, Nov 19 '16 at 08:37

score 5 · Accepted Answer · edited May 23 '17 at 12:30

In your first related link I give 5 steps for the object based deep-copy sequence, but this case is complicated by the fact that you are doing a double-pointer version of the example given in that link. The complexity associated with a double-pointer deep-copy is such that the usual recommendation is to avoid it (i.e. flatten).

The first fix we need to make to your code is to properly handle the d_par array. You need to make a corresponding allocation on the device to hold the array associated with d_par. The array associated with d_par has storage for 5 object pointers. You've allocated host-side storage for it (with new) but nowhere are you are doing a device-side allocation for it. (I'm not talking about the d_par pointer itself, I'm talking about what it points to, which is an array of 5 pointers).

The second fix we need to make is to adjust the fixup of the par pointer itself (as opposed to what it points to), in the top-level device side object. You've attempted to combine both these into a single step, but that won't work.

Here's a modified version of your code that seems to work correctly with the above changes:

$ cat t29.cu
#include <stdio.h>

class Parent {public: int my_id;};
class Child : public Parent {};

class CopyClass {
  public:
    Parent ** par;
};

const int length = 5;

__global__ void test_kernel(CopyClass *my_class){

  for (int i = 0; i < length; i++)
    printf("object: %d, id: %d\n", i, my_class->par[i]->my_id);
}

int main(){


//Instantiate object on the CPU
  CopyClass cpuClass;
  cpuClass.par = new Parent*[length];
  for(int i = 0; i < length; ++i) {
    cpuClass.par[i] = new Child;
    cpuClass.par[i]->my_id = i+1;} // so we can prove that things are working

//Allocate storage for object onto GPU and copy host object to device
  CopyClass * gpuClass;
  cudaMalloc(&gpuClass,sizeof(CopyClass));
  cudaMemcpy(gpuClass,&cpuClass,sizeof(CopyClass),cudaMemcpyHostToDevice);

//Copy dynamically allocated child objects to GPU
  Parent ** d_par;
  d_par = new Parent*[length];
  for(int i = 0; i < length; ++i) {
    cudaMalloc(&d_par[i],sizeof(Child));
    printf("\tCopying data\n");
    cudaMemcpy(d_par[i],cpuClass.par[i],sizeof(Child),cudaMemcpyHostToDevice);
  }

//Copy the d_par array itself to the device

  Parent ** td_par;
  cudaMalloc(&td_par, length * sizeof(Parent *));
  cudaMemcpy(td_par, d_par, length * sizeof(Parent *), cudaMemcpyHostToDevice);

//copy *pointer value* of td_par to appropriate location in top level object
  cudaMemcpy(&(gpuClass->par),&(td_par),sizeof(Parent **),cudaMemcpyHostToDevice);

  test_kernel<<<1,1>>>(gpuClass);
  cudaDeviceSynchronize();
  return 0;


}
$ nvcc -arch=sm_61 -o t29 t29.cu
$ cuda-memcheck ./t29
========= CUDA-MEMCHECK
        Copying data
        Copying data
        Copying data
        Copying data
        Copying data
object: 0, id: 1
object: 1, id: 2
object: 2, id: 3
object: 3, id: 4
object: 4, id: 5
========= ERROR SUMMARY: 0 errors
$

Thank you, @Robert. In the last `cudaMemcpy`, where you copy the pointer value across, it seems like we are copying a device pointer to a device pointer. Is this the case, and if so, is this always allowed for `cudaMemcpy`? — Simon Ewing, Nov 20 '16 at 00:09
The concern is the locations, not what we are copying per se. In the case of the source (`&(td_par)`), it is a location that is in host memory. In the case of the destination `(&(gpuClass->par)`), it is a location that is in device memory. If the source location is in host memory, and the destination location is in device memory, then that is always legal for `cudaMemcpyHostToDevice`. In both cases (source and destination), the locations can be computed by the compiler without the need for a device pointer dereference (which is also a necessary requirement). — Robert Crovella, Nov 20 '16 at 00:58

CUDA Copy inherited class object to device

1 Answers1