Is There Any Way To Copy vtable From Host To Device (CUDA & C++)

Question

It seems that Cuda does not allow me to "pass an object of a class derived from virtual base classes to __global__ function", for some reason related to "virtual table" or "virtual pointer".

I wonder is there some way for me to setup the "virtual pointer" manually, so that I can use the polymorphism?

In a word, no. You can potentially construct the object on the GPU, but you can't copy class dispatch data to the device (and it would make no sense to do so anyway) — talonmies, Aug 12 '20 at 15:37
You could fake polymorphism instead by storing an int for the object type and doing a switch case in the functions. It is (a bit) less runtime-efficient, but totally binarily compatible. No conversion necessary between host and device. No special construction/copy. — Sebastian, Aug 13 '20 at 17:04
That was what I used to do!but in some situation "true polymorphism"seems necessary,for examples when I build the BVH structure,which derived from "Hitable" and contains pointer to other "Hitable". — dsukrect, Aug 14 '20 at 09:53
You would create one general class with (at least) two attributes: one integer specifying whether the object is a Hitable or a BVH and one pointer pointing to another general class. — Sebastian, Aug 14 '20 at 12:52

Robert Crovella · Answer 1 · 2020-08-12T16:06:29.443

Is There Any Way To Copy vtable From Host To Device

You wouldn't want to copy the vtable from host to device. The vtable on the host (i.e. in an object created on the host) has a set of host function pointers in the vtable. When you copy such an object to the device, the vtable doesn't get changed or "fixed up", and so you end up with an object on the device, whose vtable is full of host pointers.

If you then try and call one of those virtual functions (using the object on the device, from device code), bad things happen. The numerical function entry points listed in the vtable are addresses that don't make any sense in device code.

so that I can use the polymorphism

My recommendation for a way to use polymorphism in device code is to create the object on the device. This sets up the vtable with a set of device function pointers, rather than host function pointers, and questions such as this demonstrate that it works. To a first order approximation, if you have a way to create a set of polymorphic objects in host code, I don't know of any reason why you shouldn't be able to use a similar method in device code. The issue really has to do with interoperability - moving such objects between host and device - which is what the stated limitations in the programming guide are referring to.

I wonder is there some way for me to setup the "virtual pointer" manully

There might be. In the interest of sharing knowledge, I will outline a method. However, I don't know C++ well enough to say whether this is acceptable/legal. The only thing I can say is in my very limited testing, it appears to work. But I would assume it is not legal and so I do not recommend you use this method for anything other than experimentation. Even if we don't resolve whether or not it is legal, there is already a stated CUDA limitation (as indicated above) that you should not attempt to pass objects with virtual functions between host and device. So I offer it merely as an observation, which may be interesting for experimentation or research. I don't suggest it for production code.

The basic idea is outlined in this thread. It is predicated on the idea that an ordinary object-copy does not seem to copy the virtual function pointer table, which makes sense to me, but that the object as a whole does contain the table. Therefore if we use a method like this:

template<typename T>
__device__ void fixVirtualPointers(T *other) {
        T temp =  T(*other); // object-copy moves the "guts" of the object w/o changing vtable
        memcpy(other, &temp, sizeof(T)); // pointer copy seems to move vtable
}

it seems to be possible to take a given object, create a new "dummy" object of that type, and then "fix up" the vtable by doing a pointer-based copy of the object (considering the entire object size) rather than a "typical" object-copy. Use this at your own risk. This blog may also be interesting reading, although I can't vouch for the correctness of any statements there.

Beyond this, there are a variety of other suggestions here on the cuda tag, you may wish to review them.

Thanks for answer!But I still don't know how "create the object on device" works,for instance,I have loaded a triangle mesh from file,and creare a bunch of "Triangle",then how to "recreate" them on device?And I some some trouble in build BVH structure which is a derived class of "Hitable" and contain pointer to other "Hitable",it seems impossiable to achive without polymorphism. — dsukrect, Aug 12 '20 at 23:52
You said: "I have loaded a triangle mesh from file,and creare a bunch of "Triangle". What you would do is: " load a triangle mesh from file, **transfer that triangle data to the device**, then create a bunch of "Triangle" (in device code). The linked [example](https://stackoverflow.com/questions/22988244/polymorphism-and-derived-classes-in-cuda-cuda-thrust/23476510#23476510) shows one possible approach - objects created on device from initialization data transferred from host. Obviously I can't tell you how to rewrite code you haven't shown, and that's not really my intent anyway. — Robert Crovella, Aug 12 '20 at 23:56
So,you means I have to create some function like this----- void createTraingle_kernel(Hitable* output,vec3 v0,vec3 v1,vec3 v2)? — dsukrect, Aug 13 '20 at 00:00
Yes, you might want to create a CUDA kernel. That is how you run device code. 1. " load a triangle mesh from file" - that is host code 2. "transfer that triangle data to the device" -- that would be one or more cudaMemcpy operations. 3. "then create a bunch of "Triangle" (in device code)" -that could be a CUDA kernel that you write. — Robert Crovella, Aug 13 '20 at 00:03

score 2 · Answer 2 · answered Aug 13 '20 at 06:14

I would like to provide a different way to fix the vtable which does not rely on copying the vtable between objects. The idea is to use placement new on the device to let the compiler generate the appropriate vtable. However, this approach also violates the restrictions stated in the programming guide.

#include <cstdio>

struct A{
    __host__ __device__
    virtual void foo(){
        printf("A\n");
    }
};

struct B : public A{

    B(int i = 13) : data(i){}

    __host__ __device__
    virtual void foo() override{
        printf("B %d\n", data);
    }

    int data;
};

template<class T>
__global__
void fixKernel(T* ptr){
    T tmp(*ptr);

    new (ptr) T(tmp);
}

__global__
void useKernel(A* ptr){
    ptr->foo();
}


int main(){

    A a;
    a.foo();

    B b(7); 
    b.foo();

    A* ab = new B();

    ab->foo();

    A* d_a;
    cudaMalloc(&d_a, sizeof(A));
    cudaMemcpy(d_a, &a, sizeof(A), cudaMemcpyHostToDevice);

    B* d_b;
    cudaMalloc(&d_b, sizeof(B));
    cudaMemcpy(d_b, &b, sizeof(B), cudaMemcpyHostToDevice);

    fixKernel<<<1,1>>>(d_a);

    useKernel<<<1,1>>>(d_a);

    fixKernel<<<1,1>>>(d_b);

    useKernel<<<1,1>>>(d_b);

    cudaDeviceSynchronize();

    cudaFree(d_b);
    cudaFree(d_a);
    delete ab;
}

Is There Any Way To Copy vtable From Host To Device (CUDA & C++)

2 Answers2

Linked

Related