Why it is not possible to overload host/device member function of a CUDA C++ class

Question

I have a 3d vector class with member functions marked as host and device functions. Below is snippet of one of the member function:

__host__ __device__
double Vector::GetMagReciprocal()
{
    double result = 1/sqrt(x*x + y*y + z*z);
    return result;
}

What I want to achieve is to have separate definition for host and device function so that I can get better performance by using CUDA math intrinsic function rqsrt when executing on device. The way I would do it is to overload this member function for host and device:

__host__
double Vector::GetMagReciprocal()
{
    double result = 1/sqrt(x*x + y*y + z*z);
    return result;
}

__device__
double Vector::GetMagReciprocal()
{
    double result = rsqrt(x*x + y*y + z*z);
    return result;
}

Now when I compile the Vector.cpp file using nvcc(-x cu flag), I get following error

function "Vector::GetMagReciprocal" has already been defined

Now I wonder why NVIDIA doesn't support this sort of overloading.

I can think of alternate ways of achieving the separation, but they have their own issues:

create separate member functions for host and device in vector class say GetMagReciprocalHost and GetMagReciprocalDevice and call the appropriate function in host/device code
Have a single member function GetMagReciprocal but pass a flag to the member function to choose between host code and device code

Maybe there is another easier way to achieve this. If someone has any suggestions, it will be nice.

REEDITED: I had not mentioned about possibility of conditional compilation using CUDA ARCH flag to generate separate host and device. This was actually the first thing I had done when modifying the member function. But something came to my mind which said this won't work. Perhaps I was wrong about my understanding of usage of this compilation flag. So the answer suugested by sgarizvi is the right answer

`CUDA_ARCH` *does* solve the problem. You decorate a single function with both `__host__` and `__device__`, and then use conditional compilation with the `CUDA_ARCH` macro to change code generation for device code. — Robert Crovella, Apr 16 '15 at 08:25
This is the first thing I did. But it seems to me it doesn't work. I have to test to see if this is so. — nurabha, Apr 16 '15 at 08:28
If you provide a short, complete code that someone else could copy, paste, compile and run, and see whatever the problem is, I'm sure it can be explained. But little code snippets along with assertions that "this doesn't work" is not possible to sort out for you. If this question were not closed as a duplicate, it could also be closed as lacking an [MCVE](http://stackoverflow.com/help/mcve), in my opinion. Please be sure to provide an MCVE in the future. — Robert Crovella, Apr 16 '15 at 08:33

score 9 · Accepted Answer · answered Apr 16 '15 at 08:20

9

You can use conditional compilation flag __CUDA_ARCH__ to generate different codes for host and device in a __host__ __device__ function.

__CUDA_ARCH__ is defined only for device code, so to create different implementation for host and device, you can do the following:

__host__ __device__
double Vector::GetMagReciprocal()
{
    double result;
    #ifdef __CUDA_ARCH__
    result = rsqrt(x*x + y*y + z*z);
    #else
    result = 1/sqrt(x*x + y*y + z*z);
    #endif
    return result;
}

answered Apr 16 '15 at 08:20

sgarizvi

16,623
9
64
98

I always compile with nvcc so it generates just same code for host and device function. Doesn't work – nurabha Apr 16 '15 at 08:24
5

This is absolutely the correct answer and any assertion that this doesn't work is completely incorrect. – talonmies Apr 16 '15 at 08:27
@nurabha No it doesn't generate same code for host and device. `nvcc` is not a compiler. Its a compiler driver which passes the code to actual host and device compilers at the backend. Both versions of the code will be generated. One by the host compiler and one by the device compiler. – sgarizvi Apr 16 '15 at 08:28
@sgarizvi: I know nvcc is a compiler driver. – nurabha Apr 16 '15 at 08:30
@nurabha... alright, can you please explain what is the intended purpose of what you are seeking? I mean why do you need 2 separate functions? May be we can provide an alternate solution if you can add more detail. – sgarizvi Apr 16 '15 at 08:34
@sgarizvi: let me just do a debug code step in session where I can verify. I am sure your answer is right. Actually I was already doing the conditional compilation you are also suggesting before, but than I looked at it again and it didn't seem it would work – nurabha Apr 16 '15 at 08:49
@sgarizvi: well, I had changed the whole vector class a month back and specifically used the conditional flag in all those member function I wanted a separate device/host definition. I think I just got confused. Thanks for answering. I should have never asked this stupid question in first place without revising my CUDA concepts – nurabha Apr 16 '15 at 09:14

Why it is not possible to overload host/device member function of a CUDA C++ class

1 Answers1