18

Can I copy a C++ object to the device?

say I have:

class CudaClass
{
public:
int* data;
CudaClass(int x) {
    data = new int[1]; data[0] = x;
}
};

__global__ void useClass(CudaClass cudaClass)
{
    printf("%d" cudaClass.data[0]);
};


int main()
{
    CudaClass c(1);
}

Now how do I copy "c" to device memory and launch kernel "useClass"?

SpaceMonkey
  • 4,143
  • 5
  • 38
  • 60

1 Answers1

23

Yes, you can copy an object to the device for use on the device. When the object has embedded pointers to dynamically allocated regions, the process requires some extra steps.

See my answer here for a discussion of what is involved. That answer also has a few samples code answers linked to it.

Also, in your class definition, if you want certain functions to be usable on the device, you should decorate those functions appropriately (i.e. probably with __device__ __host__);

EDIT: In response to a question (now deleted) here is the simplest sample code I could come up with based on the supplied code:

#include <stdio.h>

class CudaClass
{
public:
int* data;
CudaClass(int x) {
    data = new int[1]; data[0] = x;
}
};

__global__ void useClass(CudaClass *cudaClass)
{
    printf("%d\n", cudaClass->data[0]);
};




int main()
{
    CudaClass c(1);
    // create class storage on device and copy top level class
    CudaClass *d_c;
    cudaMalloc((void **)&d_c, sizeof(CudaClass));
    cudaMemcpy(d_c, &c, sizeof(CudaClass), cudaMemcpyHostToDevice);
    // make an allocated region on device for use by pointer in class
    int *hostdata;
    cudaMalloc((void **)&hostdata, sizeof(int));
    cudaMemcpy(hostdata, c.data, sizeof(int), cudaMemcpyHostToDevice);
    // copy pointer to allocated device storage to device class
    cudaMemcpy(&(d_c->data), &hostdata, sizeof(int *), cudaMemcpyHostToDevice);
    useClass<<<1,1>>>(d_c);
    cudaDeviceSynchronize();
    return 0;
}

In the interest of brevity/clarity I have dispensed with the usual cuda error checking.

Responding to the question, you cannot allocate storage directly from the host using the pointer in the device-based class. This is because cudaMalloc expects an ordinary host based pointer storage, such as what you get with:

int *hostdata;

cudaMalloc cannot work with a pointer whose storage is already on the device. This will not work:

cudaMalloc(&(d_c->data), sizeof(int));

because it requires dereferencing a device pointer (d_c) in host code, which is not allowed.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • ok just one more question: WHat I see is that you allocate some memory on the device, then copy the pointer value to the array inside the object. Why can't I allocate for myobject.array directly instead of using a "middle" variable to hold the data and copy its pointer into myobject.array? – SpaceMonkey Apr 15 '13 at 22:01
  • Responded to this question with an edit to my answer. I believe I've already addressed this question as well in questions posted after one of the linked answers. – Robert Crovella Apr 15 '13 at 23:26
  • Thank you so much, very clear answer ! One more question if I may: Why can't I do CudaMalloc((void**)&data, 100*sizeof(int)) In the constructor instead of data = new int[100] ? I thought that should allocate on the device directly instead of on the host then copy to device. Cheers – SpaceMonkey Apr 16 '13 at 09:29
  • You could do that, I don't recall saying you couldn't. However, you then have a host based class with a pointer to device based storage (only). Such storage could not be initialized on the host the way your current class definition does: `data[0]=x;` You could still use `cudaMemset` perhaps, or something considerably more complicated in the constructor. You're just subsuming some of the code I have written into the constructor when you do that. Live it up. – Robert Crovella Apr 16 '13 at 16:30
  • @RobertCrovella I hope you may clarify the flag here, `cudaMemcpy(&(d_c->data), &hostdata, sizeof(int *), cudaMemcpyHostToDevice);` why not using `cudaMemcpyDeviceToDevice`? I think both of the pointers are on device – Mohamed Sakr Jun 14 '14 at 03:20
  • 5
    `hostdata` contains a pointer to device memory. But `&hostdata` is a pointer to a location in host memory. So no, that pointer is not "on the device" and it does not point to a location "on the device". `&hostdata` points to a location in host memory. `hostdata` points to a location in device memory. – Robert Crovella Jun 14 '14 at 05:21
  • How would you go about copying the class object and it's data back? I've been trying to do just this for a couple days now... – Patrick Cook Jul 27 '18 at 19:08
  • 1
    For this trivially simple class in this problem, the only thing necessary would be to reverse the `cudaMemcpy` operation that copies the actual data, so, after the kernel, something like: `cudaMemcpy(c.data, hostdata, sizeof(int), cudaMemcpyDeviceToHost);`. A more complicated object would probably require several steps. Since a direct copy of the object itself from device back to host would invalidate the embedded pointers in the object, you would need a way to preserve or "fixup" these pointers, much the same way this answer demonstrates "fixing up" the embedded pointers in the device object. – Robert Crovella Jul 27 '18 at 19:46
  • Can the device allocation be taken care of within the class constructor? – If_You_Say_So Apr 06 '21 at 19:06
  • I believe so. Why not try it? – Robert Crovella Apr 06 '21 at 19:21
  • This works great for my application, thanks. I suppose I also need to use a `cudaFree` on `d_c->data`? I was thinking of putting that in the desctructor of `CudaClass`. Say my `CudaClass` has also a lot of scalar integers and doubles (~hundred), do I need to free them as well? – rinkert Sep 30 '21 at 14:22
  • In C++, I would say a general rule of thumb is that if your application uses a dynamic allocator to allocate space for a variable, its good practice that your application also contains a corresponding de-allocator. Apply that rule rigorously. Examples of dynamic allocators are `malloc()`, `new`, `cudaMalloc`, `cudaMallocManaged`, etc. The corresponding de-allocators are `free()`, `delete`, and `cudaFree`. If that doesn't clear it up for you, I suggest asking a new question. I did not try to write production-ready code here. I generally don't do that, unless the question is about it. – Robert Crovella Sep 30 '21 at 14:26