Unresolved extern function while creating objects of template class in CUDA device code and host code

Question

I define a class template in files template.cu and template.cuh. I mark the constructor and destructor as device and host callable by using host and device keyword.

template.cuh

#pragma once

#include "cuda_runtime.h"

template<class T>
class Foo
{
public:

    __host__ __device__
    Foo();

    __host__ __device__
    ~Foo();
};

template.cu

#include "template.cuh"

template<class T>
__host__ __device__
Foo<T>::Foo()
{

}

template<class T>
__host__ __device__
Foo<T>::~Foo()
{

}

// Instantiating template of type int
template
class Foo<int> ;

My main function is inside Kernel.cu file which includes template.cuh header. I just instantiate a Foo object of type int inside host and device code.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "template.cuh"

__global__ void addKernel(int *c, const int *a, const int *b)
{
    Foo<int> f;

    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

int main()
{
    Foo<int> t;
    return 0;
}

When I compile the above code files in a Visual Studio C++ project of type NVIDIA CUDA 6.5 runtime, I get unresolved extern function error with following logs:

1>  c:\Users\admin\documents\visual studio 2013\Projects\Test\Testtemplates>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o Debug\kernel.cu.obj "c:\Users\admin\documents\visual studio 2013\Projects\Test\Testtemplates\kernel.cu"     
1>  ptxas fatal   : Unresolved extern function '_ZN3FooIiEC1Ev'    
1>  kernel.cu

What is that I am doing wrong here ?

I'd suspect it's still the same rules for templates: http://stackoverflow.com/questions/495021/why-can-templates-only-be-implemented-in-the-header-file — πάντα ῥεῖ, Apr 13 '15 at 11:09
You haven't instantiated the templated function instances you wish to call at translation unit scope, as far as I can see. — talonmies, Apr 13 '15 at 11:13
@πάνταῥεῖ: Notice at the last line of template.cu file I am doing explicit instantiation of Foo. So it should work ? — nurabha, Apr 13 '15 at 11:14
@talonmies: you mean I have to explicitly instantiate every templated member function ? — nurabha, Apr 13 '15 at 11:18
@nurabha: No I mean that you are trying to compile and link a device executable binary payload using `kernel.cu` (and nothing else). There is no template instantiation for those member functions in that file, or anything which that file includes into the translation unit scope. — talonmies, Apr 13 '15 at 11:21
@πάνταῥεῖ: why not ? The link you shared provides the answer as follows: // Foo.h // no implementation template struct Foo { ... }; //---------------------------------------- // Foo.cpp // implementation of Foo's methods // explicit instantiations template class Foo; template class Foo; // You will only be able to use Foo with int or float — nurabha, Apr 13 '15 at 11:22
@πάνταῥεῖ: Ok only solution that works is the first method in the link you shared where I have to include implementation file inside the header file. The method of explicitly instantiating template class of specific type in implementation file doesn't work. — nurabha, Apr 13 '15 at 11:38
@nurabha: it does work, I guess you just did not set the correct compiler flags in VS. Try the CMake version in my answer below and compare the generated project settings with the ones you use. — m.s., Apr 13 '15 at 11:41

m.s. · Accepted Answer · 2015-04-13T11:49:22.833

5

The reason you get this error is that you did not use device code linking. Have a look at this article: Separate Compilation and Linking of CUDA C++ Device Code

I just tried the following with your code and it worked for me. Pay attention to the additional flag -dc:

nvcc template.cu kernel.cu -dc
nvcc template.o kernel.o -o kernel

I do not have much experience with Visual Studio directly, I prefer using CMake to cover generating the correct settings for VS.

The following CMakeLists.txt file worked for me on Linux and gcc, you might give it a try on Windows and VS and then compare the generated project settings with the ones you use.

PROJECT(kernel)
FIND_PACKAGE(CUDA REQUIRED)

SET(CUDA_SEPARABLE_COMPILATION ON)
CUDA_ADD_EXECUTABLE(kernel template.cuh template.cu kernel.cu)

edited Apr 13 '15 at 11:49

answered Apr 13 '15 at 11:22

m.s.

16,063
7
53
88

1

You need to keep in mind that not all architectures support separate compilation and device linkage. So while this is a perfectly good answer, it isn't a universal solution to the problem. – talonmies Apr 13 '15 at 11:23
@talonmies: -dc=true is already set for the project and both cu files. I still get the linking error in Visual studio 2013. – nurabha Apr 13 '15 at 11:26
@nurabha: There isn't any `--device-c` or `-dc` in the compiler output in your question that I can see... – talonmies Apr 13 '15 at 11:42
@talonmies: Well, you are right. There was a problem in the settings of dc. It works now. Thanks for help. – nurabha Apr 13 '15 at 11:54

Unresolved extern function while creating objects of template class in CUDA device code and host code

1 Answers1