How to run cuda cooperative template kernel

Question

I am trying to unsuccessfully launch template kernel as cooperative kernel in CUDA C++ , what am I doing wrong

error


Error       cannot determine which instance of function template "boolPrepareKernel" is intended

I try to invoke kernel like below

 ForBoolKernelArgs<int> fbArgs = ...;

    int device = 0;
    cudaDeviceProp deviceProp;
    cudaGetDeviceProperties(&deviceProp, device);
   cudaLaunchCooperativeKernel((void*)boolPrepareKernel, deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;

kernel is defined like

template <typename TYO>
__global__ void boolPrepareKernel(ForBoolKernelArgs<TYO> fbArgs) {
...
}

I tried parametrarize launch (in this example with int) like

    cudaLaunchCooperativeKernel((void*)(<int>boolPrepareKernel), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;

but I get error

no instance of overloaded function matches the argument list            argument types are: (<error-type>, int, dim3, ForBoolKernelArgs<int>)

For suggested case

cudaLaunchCooperativeKernel((void*)(boolPrepareKernel<int>), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs)

My error is

 no instance of overloaded function matches the argument list            argument types are: (void *, int, dim3, ForBoolKernelArgs<int>)

This is probably sth simple but I am stuck - thanks for help !!

For reference kernel launch like

boolPrepareKernel << <fFArgs.blocks, fFArgs.threads >> > (fbArgs);

works but of course grid synchronization is unavailable.

`boolPrepareKernel` seems wrong, why not `boolPrepareKernel`? — paleonix, Jan 26 '22 at 16:21
This won't work: `cudaLaunchCooperativeKernel((void*)boolPrepareKernel,...` You need to indicate which template specialization you want there. I also suggest providing a [mcve] rather than a bunch of snippets. It will make it easier for others to help you. — Robert Crovella, Jan 26 '22 at 16:30
Thanks ! @RobertCrovella exactly this is what I want to do - How to indicate that I want to specialize it to int — Jakub Mitura, Jan 26 '22 at 16:31
The [prototype](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION.html#group__CUDART__EXECUTION_1g504b94170f83285c71031be6d5d15f73) for `cudaLaunchCooperativeKernel` includes argument types as: `( const void* func, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMem, cudaStream_t stream )` Your error message has indicated that the arguments you are passing don't match those types, in that order. Why not make your arguments match the intended types? — Robert Crovella, Jan 26 '22 at 16:36

score 3 · Accepted Answer · answered Jan 26 '22 at 17:07

Here is a minimal example that will compile:

$ cat t1954.cu
template <typename TYO>
struct ForBoolKernelArgs
{
    TYO val;
};

template <typename TYO>
__global__ void boolPrepareKernel(ForBoolKernelArgs<TYO> fbArgs) {
}


int main(){
  ForBoolKernelArgs<int> fbArgs;
  void *kernel_args[] = {&fbArgs};
  cudaLaunchCooperativeKernel((void*)(boolPrepareKernel<int>), 1, 1, kernel_args) ;
}
$ nvcc -o t1954 t1954.cu
$

Probably the main issue you had remaining is that you are not following proper instructions for passing kernel arguments.

How to run cuda cooperative template kernel

1 Answers1