When using CUDA, I often get to compare execution times for single and for double precision (float/double). To avoid copy-pasting methods, I often use templates in the standard case to switch between float and double.
The problem starts when I have to use extern methods like methods from cusparse/cublas libraries. In this particular case, you have for example:
cublasSaxpy() // single precision
cublasDaxpy() // double precision
If lazy, the simpliest solution is to copy paste methods
myFloatMethod(float var)
{
// do stuff in float
cublasSaxpy(var);
}
myDoubleMethod(double var)
{
// do stuff in double
cublasDaxpy(var);
}
I already tried to search for this problem and the only solution I found is to globally define the methods like this:
#define cublasTaxpy cublasSaxpy // or cublasDaxpy
#define DATATYPE float // or double
and use cublasTaxpy instead of cublasSaxpy/cublasDaxpy. Each time I want to change the precision, I only change the defines without having duplicate codes or going through the entire code.
Is there any proper way to do it better ?