1

I am struggling with a problem that seems a little obscure.

I am working on a framework where a user can provide an implementation of an abstract base class which, after a few steps of magic and code generation, will be used inside a CUDA kernel.

I know that

"It is not allowed to pass as an argument to a global function an object of a class with virtual functions. "

because the vtable will be junk when created on the host and then copied to the GPU. But I do not pass the object to the kernel, I construct the object inside the kernel, which should not cause the vtable issue.

class VirtualBase {
public:
    __device__ virtual int getResult() const = 0;
    __device__ virtual ~VirtualBase();
};

class Implementation : public VirtualBase {
public:
    __device__ Implementation(){};
    __device__ int getResult() const { return 42; };
    __device__ ~Implementation() {};
};

__global__ void kernel() {
    Implementation impl;
    int res = impl.getResult();
}

int main(void) {
    kernel<<<1, 1>>>();
    return 0;
}

The code is compiled with Nsights autogenerated makefile

/Developer/NVIDIA/CUDA-7.5/bin/nvcc -G -g -O0 -std=c++11 -gencode arch=compute_30,code=sm_30  -odir "src" -M -o "src/main.d" "../src/main.cu"
/Developer/NVIDIA/CUDA-7.5/bin/nvcc -G -g -O0 -std=c++11 --compile --relocatable-device-code=false -gencode arch=compute_30,code=compute_30 -gencode arch=compute_30,code=sm_30  -x cu -o  "src/main.o" "../src/main.cu"

which results in the error

ptxas fatal   : Unresolved extern function '_ZN11VirtualBaseD2Ev'
make: *** [src/main.o] Error 255

I am on a Mac with CUDA 7.5 installed but I tried the same thing on a machine with Ubuntu 14.10 and CUDA 7.0, yielding the same results.

Dave
  • 1,784
  • 2
  • 23
  • 35

1 Answers1

0

After more hours of debugging, writing this question and staring at the ptxas error, I had the strange feeling that this is the destructor of the base class that is not found, because of the D near the end of _ZN11VirtualBaseD2Ev.

I looked for ways to demangle the identifier and, indeed, the D stands for destructor (and the standard constructor has a C in the same spot).

A few debug statements later, I realised that, when the Implementation impl; goes out of scope, both destructors are called, it's own first, and the base classes after that. Since the base class' destructor doesn't have an implementation, it can't be called, and the error is thrown.

Edit: This destructor calling is, of course, not a CUDA issue, but standard C++ routine. Additionally, as Robert Crovella pointed out in the comments, CUDA does support classes that implement virtual functions if they are instantiated on the device.

Community
  • 1
  • 1
Dave
  • 1,784
  • 2
  • 23
  • 35
  • 3
    And to answer your original question, yes, you can use a class that implements virtual functions inside a CUDA kernel. The way to do this is to instantiate objects of those classes on the device, not on the host. The restriction is that you can't pass objects of such classes from host to device. The restriction does not state that you are unable to use them on the device. – Robert Crovella Oct 18 '15 at 14:02
  • 1
    A base class should always provide an implementation of its virtual destructor (eg `virtual ~Base() = default;`). You may want to [read a C++ book](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). – user703016 Oct 19 '15 at 01:07