Function templates in CUDA do not link

Question

I want to create an application using CUDA 10.1 (on VisualStudio 2017, 64bit). For flexibility I want to use templates since the pixeltypes and dimensions can vary. Code snippets are below. But somehow it does not compile and gives an error LNK2019

unresolved external symbol "__declspec(dllimport) void __cdecl CUDA_gradient(float *,unsigned int *,float *,float *)" (__imp_??$CUDA_gradient@MM$02@@YAXPEAMPEAI00@Z) referenced in function "protected: virtual void __cdecl itk::CudaGradientImageFilter,float,float,class itk::CudaImage,3> >::GPUGenerateData(void)" (?GPUGenerateData@?$CudaGradientImageFilter@V?$CudaImage@M$02@itk@@MMV?$CudaImage@V?$CovariantVector@M$02@itk@@$02@2@@itk@@MEAAXXZ)

According to some other questions CUDA does not have a problem with templates.

What are the caveats and main trap doors for CUDA templates in generic programming?

The function call in my cpp file is:

#include "itkCudaGradientImageFilter.h"
#include "itkCudaGradientImageFilter.hcu"
// ...
template <typename TInputImage, typename TOperatorValueType, typename TOutputValueType, typename TOutputImageType>
void
CudaGradientImageFilter<TInputImage,
                       TOperatorValueType,
                       TOutputValueType,
                       TOutputImageType>
::GPUGenerateData()
{
    //...the InputPxelType and the InputImageDimension are aquired somewhere else.
    CUDA_gradient<InputPixelType, OutputValueType, InputImageDimension>(pin, outputSize, outputSpacing, pout);
    //...

}

The outcome does not change the when I call:

CUDA_gradient<float, float, 3>(pin, outputSize, outputSpacing, pout);

I have a cuda header file (.hcu)

#include "ImageFeaturesExport.h"

    template<typename TInputPixelType,
             typename TOutputValueType,
             unsigned int TImageDimension = 3>
    void
    ImageFeatures_EXPORT
    CUDA_gradient(
        TInputPixelType* dev_in,
        unsigned int* size,
        float* spacing,
        TOutputValueType* dev_out
    );

and the corresponding functions in the .cu file:

#include "itkCudaGradientImageFilter.hcu"
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

template<typename TInputPixelType,
         typename TOutputPixelType,
         unsigned int TImageDimension>
__global__
void
gradient_kernel(cudaTextureObject_t in, TOutputPixelType* grad)
{
    //...compute gradient
}

template<typename TInputPixelType, typename TOutputValueType, unsigned int TImageDimension>
void
CUDA_gradient(
    TInputPixelType* dev_in,
    unsigned int* size,
    float* spacing,
    TOutputValueType* dev_out)
{
    // prepare texture, copy memory,...

    gradient_kernel<TInputPixelType, TOutputValueType, TImageDimension><<< dimGrid, dimBlock >>> (texObj, dev_out);
    CUDA_CHECK_ERROR;

    // Clean up [...]
}

What happens if you move the template definitions to the header? — AndyG, Jul 10 '19 at 11:55
I doubt the answer has anything to do with CUDA. You haven't shown any template instantiation anywhere where the code for the templates is defined. If it doesn't exist -- that is your problem — talonmies, Jul 10 '19 at 11:56
Should I add the instatiations like generic_opto_guys answer pointed out. I tried but still failure. Now I added some more information about the context where the function is called, Could you point me in a direction? I did not find a proper example to study. — KabCode, Jul 10 '19 at 13:02
Using either visual C++ or g++, you *must* instantiate the template instances you require in the same translation unit where they are defined, otherwise the compiler will not emit any code for them. So yes, you have to do what is outlined in the answer — talonmies, Jul 10 '19 at 13:30

score 3 · Accepted Answer · 2019-07-10T12:28:06.400

3

You have a link error.

First notice that your .cpp and your .cu file are compiled independently.

So there is no way nvcc knows for what template-parameters it has to compile the function CUDA_gradient.

You can make it compile with adding the line to your .cu file:

template void CUDA_gradient<float,float,3>(
    float* dev_in,
    unsigned int* size,
    float* spacing,
    float* dev_out);

Of course this only allows the usage of this exact template parameters. But If you know the set of possible template parameters have a nice copy-paste-replace-party and your good.

edited Jul 10 '19 at 12:28

answered Jul 10 '19 at 12:05

4

Small correction -- it isn't the kernel which is the source of the linking issue, it is the host side wrapper function – talonmies Jul 10 '19 at 12:20

Function templates in CUDA do not link

1 Answers1