3

I'm trying to create a class that will get allocated on the device. I want the constructor to run on the device so that the whole object including the fields inside are automatically allocated on the device instead of having to create a host object then copy it manually to the device.

I'm using thrust device_new

Here is my code:

using namespace thrust;

class Particle
{
    public:
    int* data;

    __device__  Particle()
    {
        data = new int[10];
        for (int i=0; i<10; i++)
        {
            data[i] = i*2;
        }
    }
};


__global__ void test(Particle* p)
{
    for (int i=0; i<10; i++)
        printf("%d\n", p->data[i]);
}

int main() {

    device_ptr<Particle> p = device_new<Particle>();

    test<<<1,1>>>(thrust::raw_pointer_cast(p));


    cudaDeviceSynchronize();

    printf("Done!\n");

}

I annotated the constructor with __device__ and used device_new (thrust), but this doesn't work, can someone explain to me why?

Cheers for help

SpaceMonkey
  • 4,143
  • 5
  • 38
  • 60
  • "Doesn't work" isn't a valid description of a problem. Please describe exactly what the problem is. – talonmies Apr 18 '13 at 12:25
  • The kernel doesn't print anything. – SpaceMonkey Apr 18 '13 at 18:48
  • You have *no* error checking at all in this code. Start by adding some (see [this question](http://stackoverflow.com/q/14038589/681865) for how). That will tell you more about what might be failing. – talonmies Apr 18 '13 at 19:16

2 Answers2

4

I believe the answer lies in the description given here. Someone who knows thrust under the hood will probably come along and indicate whether this is true or not.

Although thrust has changed a lot since 2009, I believe device_new may still be using some form of operation where the object is actually temporarily instantiated on the host, then copied to the device. I believe the size limitation described in the above reference is no longer applicable, however.

I was able to get this to work:

#include <stdio.h>
#include <thrust/device_ptr.h>
#include <thrust/device_new.h>

#define N 512

using namespace thrust;

class Particle
{
    public:
    int data[N];

    __device__ __host__  Particle()
    {
//        data = new int[10];
        for (int i=0; i<N; i++)
        {
            data[i] = i*2;
        }
    }
};


__global__ void test(Particle* p)
{
    for (int i=0; i<N; i++)
        printf("%d\n", p->data[i]);
}

int main() {

    device_ptr<Particle> p = device_new<Particle>();

    test<<<1,1>>>(thrust::raw_pointer_cast(p));


    cudaDeviceSynchronize();

    printf("Done!\n");

}

Interestingly, it gives bogus results if I omit the __host__ decorator on the constructor, suggesting to me that the temporary object copy mechanism is still in place. It also gives bogus results (and cuda-memcheck reports out-of-bounds access errors) if I switch to using the dynamic allocation for data instead of static, also suggesting to me that device_new is using a temporary object creation on the host followed by a copy to the device.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • yeah, I think you are right. I think it's just doing the same as the manual way of initialising the object on the host then copying it to the device (as you did in your answer to my previous question). So obviously the dynamically allocated field inside needs to be copied manually (again as you showed in your answer to my previous question). I thought that thrust will more "intelligent" and can somehow initialise the object directly on the device so everything dynamically allocated inside will also be on the device. – SpaceMonkey Apr 18 '13 at 18:30
  • I think the reason why your method works is because when it's statically allocated, it gets copied along with the object to the device, I need to test this by doing it manually. I will report back. – SpaceMonkey Apr 18 '13 at 18:33
  • Yes, with a static allocation, the entire `data` array is part of the object. With a dynamic allocation, only the pointer `data` is part of the object, strictly speaking. – Robert Crovella Apr 18 '13 at 18:56
0

First of all thanks to Rovert Crovella for his input (and previous answers)

So apparently I "overestimated" what device_new can do, I thought that it can initialise the object directly on the device, so any dynamically allocated memory inside is done on the device too.

But it seems like device_new is basically just doing the same as the manual way:

Particle temp;
Particle *d_p;
cudaMalloc(&d_p, sizeof(Particle));
cudaMemcpy(d_p, &temp, sizeof(Particle), cudaMemcpyHostToDevice);

So it makes a temp host object and copies it just like how it would be done manually. That means the memory allocated inside the object is allocated on the host, and only the pointer gets copied as part of the object, so you cannot use that memory in a kernel, you have to copy that memory manually to the device, and thrust doesn't seem to be doing that.

So it's just a cleaner way of creating a temp host object and copying it, except that you lose the ability to copy the dynamic memory allocated inside since you don't have access to that temp variable.

I hope in the future, there will be a method or a feature in CUDA that makes you initialise the object directly on the device so any dynamically allocated data in the constructor (or elsewhere) is allocated on the device too, instead of the tedious way of copying every piece of memory manually.

SpaceMonkey
  • 4,143
  • 5
  • 38
  • 60