I'm trying to templatize a CUDA kernel based on a boolean variable (as shown here: Should I unify two similar kernels with an 'if' statement, risking performance loss?), but I keep getting a compiler error that says my function is not a template. I think that I'm just missing something obvious so it's pretty frustrating.
The following does NOT work:
util.cuh
#include "kernels.cuh"
//Utility functions
kernels.cuh
#ifndef KERNELS
#define KERNELS
template<bool approx>
__global__ void kernel(...params...);
#endif
kernels.cu
template<bool approx>
__global__ void kernel(...params...)
{
if(approx)
{
//Approximate calculation
}
else
{
//Exact calculation
}
}
template __global__ void kernel<false>(...params...); //Error occurs here
main.cu
#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);
The following DOES work:
util.cuh
#include "kernels.cuh"
//Utility functions
kernels.cuh
#ifndef KERNELS
#define KERNELS
template<bool approx>
__global__ void kernel(...params...);
template<bool approx>
__global__ void kernel(...params...)
{
if(approx)
{
//Approximate calculation
}
else
{
//Exact calculation
}
}
#endif
main.cu
#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);
If I throw in the
template __global__ void kernel<false>(...params...);
line at the end of kernels.cuh it also works.
I get the following errors (both referring to the marked line above):
kernel is not a template
invalid explicit instantiation declaration
If it makes a difference I compile all of my .cu files in one line, like:
nvcc -O3 -arch=sm_21 -I. main.cu kernels.cu -o program