How to pass a function as a cuda kernel parameter?

Question

I would like to create generic cuda kernel that take a callable object as parameter (like lambda or function) and invoke it.

I am having trouble to pass a device function to a cuda kernel as a parameter.

I have cuda 9.2 with compute capability 3.5. I use gcc 9.3 on Debian 10.

I tried this, compiled with nvcc -arch=sm_35 --expt-extended-lambda main.cu -o test:

    __host__ __device__ void say_hello()
    {
        printf("Hello World from function!\n");
    }

    template<class Function>
    __global__ void generic_kernel(Function f)
    {
        f();
    }

    int main() 
    {
            // this is working
        generic_kernel<<<1,1>>>([]__device__(){printf("Hello World from lambda!\n");});
        cudaDeviceSynchronize();

            // this is not working!
        generic_kernel<<<1,1>>>(say_hello); 
        cudaDeviceSynchronize();

        return 0;
    }

I expected to see both Hello World from function! and Hello World from lambda! but I only see the message from the lambda.

the second example is passing the host function to the kernel, which is clearly wrong, among other problema — talonmies, Jul 29 '19 at 19:59
I though that `say_hello` was a device function so it can be passed to the kernel. Is is normal that nvcc does not throw any warning here? — T.L, Jul 29 '19 at 22:11
You have explicitly defined it as both a host and device function. The kernel launch runs in host code. The resulting function reference passed is a host function — talonmies, Jul 30 '19 at 03:36

Robert Crovella · Accepted Answer · 2019-07-30T02:53:24.470

Debian is not a supported environment for any version of CUDA. gcc 9.3 is not a supported tool for CUDA 9.2

There are quite a few questions covering these topics here on the cuda tag. This answer links to a number of them.

The short version is that it is fundamentally impossible to capture a __device__ function address in host code. A kernel launch (as you have it here) is written in host code; it is host code. Therefore the use of say_hello there is in host code, and it will refer to the __host__ function pointer/address. That function pointer/address is useless in device code. (Removing the __host__ decorator will not help.)

There are a number of possible solutions, one of which you've already explored. Pass the function wrapped in an object of some sort, and the __device__ lambda when used directly as you have, fits that description.

Another possible fix for the function pointer approach you have that is not working, is to capture the function pointer in device code. It then has to be passed to the host, where it can then be passed back through a kernel launch to device code, and dispatched there. The linked answer above gives a number of ways this can be accomplished.

How to pass a function as a cuda kernel parameter?

1 Answers1