I encountered a problem when using the overloaded kernel functions in CUDA.
I can understand CUDA can launch an overloaded function by its arguments.
However, if I would like to use cudaOccupancyMaxPotentialBlockSize()
to calculate the block size for maximum occupancy, see doc.
__global__ void foo_cuda_kernel(int a)
{
/*implementation 1*/
}
//overloaded kernel function
__global__ void foo_cuda_kernel(int a, int b)
{
/*implementation 2*/
}
void foo_cuda()
{
int min_grid_size, grid_size, block_size;
cudaOccupancyMaxPotentialBlockSize
(
&min_grid_size, &block_size,
foo_cuda_kernel, //how does it distinguish overloaded functions?
0, thread_num
);
grid_size = (thread_num + block_size - 1) / block_size;
//I can understand compiler can distinguish the launched function by its arguments
foo_cuda_kernel<<<grid_size, block_size>>>((int)1);
cudaDeviceSynchronize();
}
How to make it works? How cudaOccupancyMaxPotentialBlockSize()
distinguishes overloaded functions?