I have a virtual function which returns a different lambda depending on the derived class:
class Base
{
public:
virtual std::function<float()> foo(void) = 0;
};
class Derived : public Base
{
public:
std::function<float()> foo(void) {
return [] __device__ (void) {
return 1.0f;
};
}
};
Then I want to pass this lambda to a CUDA kernel and call it from the device. In other words, I want to do this:
template<typename Func>
__global__ void kernel(Func f) {
f();
}
int main(int argc, char** argv)
{
Base* obj = new Derived;
kernel<<<1, 1>>>(obj->foo());
cudaDeviceSynchronize();
return 0;
}
Tha above give an error like this: calling a __host__ function("std::function<float ()> ::operator ()") from a __global__ function("kernel< ::std::function<float ()> > ") is not allowed
As you can see, I declare my lambda as __device__
, but the foo()
method stores it in a std::function
in order to return it. As a result, what is passed to the kernel()
is a host address and of course it does not work. I guess that is my problem, right? So my questions are:
Is it somehow possible to create a
__device__ std::function
and return that from thefoo()
method?If this is not possible, is there any other way to dynamically select a lambda and pass it to the CUDA kernel? Hard-coding multiple calls to
kernel()
with all the possible lambdas is not an option.
So far, from the quick research I did, CUDA does not have/support the necessary syntax required to make a function return a device lambda. I just hope I am wrong. :) Any ideas?
Thanks in advance