1

When using CUDA, I often get to compare execution times for single and for double precision (float/double). To avoid copy-pasting methods, I often use templates in the standard case to switch between float and double.

The problem starts when I have to use extern methods like methods from cusparse/cublas libraries. In this particular case, you have for example:

cublasSaxpy() // single precision
cublasDaxpy() // double precision

If lazy, the simpliest solution is to copy paste methods

myFloatMethod(float var)
{
    // do stuff in float
    cublasSaxpy(var);
}

myDoubleMethod(double var)
{
    // do stuff in double
    cublasDaxpy(var);
}

I already tried to search for this problem and the only solution I found is to globally define the methods like this:

#define cublasTaxpy cublasSaxpy // or cublasDaxpy
#define DATATYPE float // or double

and use cublasTaxpy instead of cublasSaxpy/cublasDaxpy. Each time I want to change the precision, I only change the defines without having duplicate codes or going through the entire code.

Is there any proper way to do it better ?

Kriegalex
  • 423
  • 6
  • 17

1 Answers1

4

Instead of macro, you may create overload for cublasTaxpy()

void cublasTaxpy(float f) { cublasSaxpy(f); }
void cublasTaxpy(double d) { cublasDaxpy(d); }

or wrapping the whole set of functions inside specialized struct:

template<typename FLOAT> struct helper_cublas;

template<> struct helper_cublas<float> {
    static void cublasTaxpy(float f) { cublasSaxpy(f); }
    // other functions    
};

template<> struct helper_cublas<double> {
    static void cublasTaxpy(double d) { cublasDaxpy(d); }
    // other functions    
};
Jarod42
  • 203,559
  • 14
  • 181
  • 302