CUDA & Kernel Wrapper & Template & Compile Error

Question

I meet a problem when I apply template techniques in kernel wrapper functions.

Here is the codes in my original minds:

//----------------------------------------  
// cuda_demo.cuh
template<typename T> 
void kernel_wrapper(T param);

//----------------------------------------   
// cuda_demo.cu
#include <cuda.h>
#include <cuda_runtime.h>
#include "cuda_demo.cuh"

template<typename T>
__global__ void my_kernel(T param) { 
    // do something 
}

template<typename T>
void kernel_wrapper(T param) { 
    my_kernel<<<1,1>>>(param);
}

//---------------------------------------- 
// main.cpp
#include "cuda_demo.cuh"
int main() {
  int param = 10;
  kernel_wrapper(param);
  return 0;
}

Soon I find that templates should be implemented in the header file(see Why can templates only be implemented in the header file?).

And I get two solutions from that, the common one is "to write the template declaration in a header file, then implement the class in an implementation file (for example .tpp), and include this implementation file at the end of the header".

So I change the codes:

//----------------------------------------  
// cuda_demo.cuh
template<typename T> 
void kernel_wrapper(T param);

#include "cuda_demo.cu"

//----------------------------------------   
// cuda_demo.cu
#include <cuda.h>
#include <cuda_runtime.h>

template<typename T>
__global__ void my_kernel(T param) { 
    // do something 
}

template<typename T>
void kernel_wrapper(T param) { 
    my_kernel<<<1,1>>>(param);
}

The compiler gives me the following error:

error: expected primary-expression before < token
   my_kernel<<<1,1>>>(param);

The same error occurs when I put all cuda codes in "cuda_demo.cuh".

Then I tried the second solution as following:

//----------------------------------------  
// cuda_demo.cuh
template<typename T> 
void kernel_wrapper(T param);

//----------------------------------------   
// cuda_demo.cu
#include <cuda.h>
#include <cuda_runtime.h>
#include "cuda_demo.cuh"

template<typename T>
__global__ void my_kernel(T param) { 
    // do something 
}

template<typename T>
void kernel_wrapper(T param) { 
    my_kernel<<<1,1>>>(param);
}

template void kernel_wrapper<int>(int param);

This one works well! But in my project, 'T' is not a simple type, which may be recursive like

Class_1<Class_2<Class_3<...>>>,

Which means I cannot figure out the specific type of 'T' in advance.

Does somebody know how to solve that?

Thanks.

You must make sure that CUDA code is in a .cu file and compile it with nvcc — talonmies, Dec 06 '18 at 16:58
I put CUDA code in .cu file and include it from main.cpp, it does not works. And I find some open source projects put CUDA code in .h or .cuh files, how did they do that? Is it a problem with the config? — Samuel Lee, Dec 07 '18 at 00:15
You can't include CUDA code into a cpp file. You must compile it separately in a .cu file, otherwise it will not work — talonmies, Dec 07 '18 at 11:29
@talonmies That’s right. Thanks, and I think I’ve find the solution. — Samuel Lee, Dec 08 '18 at 05:04

score 2 · Accepted Answer · answered Dec 08 '18 at 05:40

I found the essence of the problem.

All cuda code must be include in .cu files so that they can be compiled by nvcc. Thanks for the remind. @talonmies.

Recently, I find some open source projects mix cuda, C++ code together into .h or .cuh files, and then include those header files from a .cpp file as well as .cu file. It makes me believe that cuda code can be compiled by gcc.

But I finally found, although many .cpp files include the cuda code, none of them call the cuda functions in .cpp file. And cuda functions callings only exist in .cu files.

How did they do that? The answer is conditional-compilation. In this way, cuda code in .cu files will be compiled by nvcc, but those in .cpp files will be ignored by gcc.

As for my original question, the most effective solution is writing all implementation of template cuda code into header files, and call kernel wrappers only in .cu files.

I spent lots of time on this problem, and I hope my experience could help you.

CUDA & Kernel Wrapper & Template & Compile Error

1 Answers1