I would like to create generic cuda kernel that take a callable object as parameter (like lambda or function) and invoke it.
I am having trouble to pass a device function to a cuda kernel as a parameter.
I have cuda 9.2 with compute capability 3.5. I use gcc 9.3 on Debian 10.
I tried this, compiled with nvcc -arch=sm_35 --expt-extended-lambda main.cu -o test
:
__host__ __device__ void say_hello()
{
printf("Hello World from function!\n");
}
template<class Function>
__global__ void generic_kernel(Function f)
{
f();
}
int main()
{
// this is working
generic_kernel<<<1,1>>>([]__device__(){printf("Hello World from lambda!\n");});
cudaDeviceSynchronize();
// this is not working!
generic_kernel<<<1,1>>>(say_hello);
cudaDeviceSynchronize();
return 0;
}
I expected to see both Hello World from function!
and Hello World from lambda!
but I only see the message from the lambda.