Passing an object with virtual functions to a CUDA kernel

Question

The CUDA programming guide explicitly prohibits passing an object of any class with virtual functions to a __global__ function [1]. To my knowledge, Nvidia hasn't supplied an official explanation for why this isn't possible, but I surmise that it's because the vtable lookup results in an indirect jump, which isn't possible in device code [2].

In my project, I have a set of classes with virtual functions that rely on dynamic polymorphism in host code. I'd like to pass instances of these classes to the device but supply type information to the compiler via a template instead of using dynamic polymorphism. Even though I don't require dynamic polymorphism for my device code, I would like to be able to re-use my original classes with virtual functions instead of re-implementing them to remove the virtual functions.

Is it possible to write my code such that a vtable lookup is avoided in this case? If so, is it safe to rely on this behavior or should I be wary of using this in my project?

In the following minimal example, I'm passing a derived class object with virtual functions to a kernel function, but there is no reference to the base class or need for dynamic polymorphism. The compiler is aware of the type of the argument at compile-time and need not generate code that performs a vtable lookup.

// example.cu

#include <cstdio>

class Base {
public:
    __host__ __device__ virtual void foo() const = 0;

    virtual ~Base() = default;
};

class Derived : public Base {
public:
    __host__ __device__ void foo() const override
    {   
        std::printf("Called Derived::foo()\n");
    }   
};

template<class T>
__global__ void bar(const T derived)
{
    derived.foo();
}

int main()
{
    Derived d;
    bar<Derived><<<1, 1>>>(d);

    if (cudaPeekAtLastError() != cudaSuccess) return 1;
    if (cudaDeviceSynchronize() != cudaSuccess) return 2;
    return 0;
}

$ nvcc --run example.cu && echo $?
Called Derived::foo()
0

This example successfully compiles and runs for me with nvcc v9.0 (with gcc v6.1 as the host compiler). Is it possible to safely use this idiom, or am I relying on undefined behavior?

The "indirect jump" (i.e. the use of a virtual function pointer table) is certainly possible. The problem arises due to the fact that when such an object is constructed in host code (which would normally be true if you are passing such an object to device code) the vtable is full of host code addresses/function entry points. Such function entry points are not usable in device code. The address of a `__device__ __host__` function in CUDA is not guaranteed to be numerically the same between the `__host__` entry point and the `__device__` entry point. — Robert Crovella, May 20 '20 at 22:58
I see. Thanks for clearing up my confusion, @RobertCrovella. I suppose the other half of my question is, more generally, is it possible to write my kernel code in a way that ensures that the compiler won't try to generate a vtable lookup from device code. I suspect that it's not possible without relying on compiler-specific behavior. — Geoff M, May 20 '20 at 23:18
I'm not a language lawyer so I will choose not to answer. However I can make your code break with this seemingly innocuous replacement for your kernel code: `const Derived *p = &derived; p->foo();` I imagine you understand this already. It seems to me that compiler behavior is such that when the function is invoked via object pointer, the vtable gets used (even if it shouldn't be necessary!) but when accessed directly as an object-method, it will use the device code entry point. I don't know if the compiler is required to behave that way or not. Even if it is, it strikes me as somewhat fragile — Robert Crovella, May 21 '20 at 00:17
This also breaks: `__host__ __device__ void my_foo(const Derived &d) {d.foo();}`, and replace your kernel code with: `my_foo(derived);`. Seems fragile. At best. — Robert Crovella, May 21 '20 at 00:32
You're welcome. I like thought-provoking questions like this one, rather than the run-of-the-mill stuff. — Robert Crovella, May 21 '20 at 00:40
[This thread](https://forums.developer.nvidia.com/t/copying-objects-to-device-with-virtual-functions/54927) demonstrates a method to support my assertion in the first comment that the issue is with the vtable construction, and not with some supposed inability to use a vtable. Again, I don't have a sound language interpretation of correctness, but [this blog](https://diego.assencio.com/?index=e5f2a59886a83b5c7d2c0093dbf689f9) seems to agree with some of the comments I made in that thread. — Robert Crovella, May 22 '20 at 15:34
Yeah, I think your earlier comment makes sense. This issue is just an interesting case of the more general problem of trying to pass an object that's not trivially copyable to the device. Your hack from the forum thread is the kind that should only be spoken of in dark, smoke-filled rooms, haha. Interestingly, I think it would work really well in my case... — Geoff M, May 22 '20 at 18:24

Passing an object with virtual functions to a CUDA kernel

0 Answers0