I want to create an application using CUDA 10.1 (on VisualStudio 2017, 64bit). For flexibility I want to use templates since the pixeltypes and dimensions can vary. Code snippets are below.
But somehow it does not compile and gives an error LNK2019
unresolved external symbol "__declspec(dllimport) void __cdecl CUDA_gradient(float *,unsigned int *,float *,float *)" (__imp_??$CUDA_gradient@MM$02@@YAXPEAMPEAI00@Z) referenced in function "protected: virtual void __cdecl itk::CudaGradientImageFilter,float,float,class itk::CudaImage,3> >::GPUGenerateData(void)" (?GPUGenerateData@?$CudaGradientImageFilter@V?$CudaImage@M$02@itk@@MMV?$CudaImage@V?$CovariantVector@M$02@itk@@$02@2@@itk@@MEAAXXZ)
According to some other questions CUDA does not have a problem with templates.
What are the caveats and main trap doors for CUDA templates in generic programming?
The function call in my cpp file is:
#include "itkCudaGradientImageFilter.h"
#include "itkCudaGradientImageFilter.hcu"
// ...
template <typename TInputImage, typename TOperatorValueType, typename TOutputValueType, typename TOutputImageType>
void
CudaGradientImageFilter<TInputImage,
TOperatorValueType,
TOutputValueType,
TOutputImageType>
::GPUGenerateData()
{
//...the InputPxelType and the InputImageDimension are aquired somewhere else.
CUDA_gradient<InputPixelType, OutputValueType, InputImageDimension>(pin, outputSize, outputSpacing, pout);
//...
}
The outcome does not change the when I call:
CUDA_gradient<float, float, 3>(pin, outputSize, outputSpacing, pout);
I have a cuda header file (.hcu)
#include "ImageFeaturesExport.h"
template<typename TInputPixelType,
typename TOutputValueType,
unsigned int TImageDimension = 3>
void
ImageFeatures_EXPORT
CUDA_gradient(
TInputPixelType* dev_in,
unsigned int* size,
float* spacing,
TOutputValueType* dev_out
);
and the corresponding functions in the .cu file:
#include "itkCudaGradientImageFilter.hcu"
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
template<typename TInputPixelType,
typename TOutputPixelType,
unsigned int TImageDimension>
__global__
void
gradient_kernel(cudaTextureObject_t in, TOutputPixelType* grad)
{
//...compute gradient
}
template<typename TInputPixelType, typename TOutputValueType, unsigned int TImageDimension>
void
CUDA_gradient(
TInputPixelType* dev_in,
unsigned int* size,
float* spacing,
TOutputValueType* dev_out)
{
// prepare texture, copy memory,...
gradient_kernel<TInputPixelType, TOutputValueType, TImageDimension><<< dimGrid, dimBlock >>> (texObj, dev_out);
CUDA_CHECK_ERROR;
// Clean up [...]
}