templated class, nontemplate member: does the member need explicit instantiation in one translation unit to be seen in another w/ a fwd declaration

Question

This question comes from a CUDA context, where you have multiple compilers in use so bear with me - it presents the situation (otherwise you might ask why this problem would come up or suggest things that would be wastes of time). Just about every C++ 03 or 11 rule applies depending on the compile flags, I believe there are a few exceptions but let's pretend there are none as this contrived setup could be replicated without nvcc.

dvector.h:

//parsable by both nvidia's cuda compiler (nvcc) and good old gcc and kin.
//gcc compiled code is expected to link up and call subroutines compiled with cuda for gpu accceleration    
template<typename T>
struct dvector{// a vector for manipulating GPU memory
    size length;
    T *data;
    void fill(const T &val);

};

#if __CUDACC__ //if being parsed by nvidia' compiler
template<typename T>
void dvector<T>::fill(const T &val){
   //do some nvcc-only stuff, gcc wouldn't understand
}
#endif

Suppose we have 2 object files we are compiling, both include the header containing this declaration and conditionally defined definition.

srcfile.cpp is compiled with gcc
srcfile.cu is compiled with nvcc
srcfile.h usable with both nvcc and gcc

srcfile.h:

#include "dvector.h"
void call_work_kernel(dvector<int> &v);

srcfile.cu:

#include "srcfile.h"

template<typename T>
__global__ void work_kernel(device_vector<T> v){//purposely by value... this this is the way to pass it
    const int tid = threadIdx.x + blockDim.x + blockIdx.x;
    if(tid < threadIdx){
        v.data[tid] += 1;
    }
}

void call_work_kernel(dvector<int> &v){
     const size_t blocksize = 32;
     const size_t n_blocks = (v.length + blocksize - 1) / blocksize;
     dim3 blockDims(n_blocks);
     dim3 blockSize(blocksize);
     work_kernel<<<blockDims, blockSize>>>(v);

}

srcfile.cpp:

int main(){
    device_vector<int>::fill(42);
    call_work_kernel(v);
}

So playing by the standard, will the implicit instantiation of dvector<int> instantiate member function dvector<int>::fill? Does this still happen with the dvector<int>& vs a routine taking dvector<int>?

The answer here quotes what looks to be a draft standard, although I don't have a copy so I can't tell for sure.

FYI [Where do I find the current C or C++ standard documents?](http://stackoverflow.com/q/81656) — Shafik Yaghmour, Oct 08 '15 at 01:49
Obviously, the instantiation can't be done by `gcc`, because it never has the definition of `dvector::fil()` visible. To be clear, the implicit instantiation you're referring to is `dvector` as named in the prototype of `call_work_kernel` which is seen by `nvcc` in a context in which the definition of `dvector::fill()` is visible. And you want to know if the standard says nvcc *must* instantiate this member, *may* instantiate it, or *must not*? — Phil Miller, Oct 08 '15 at 02:07
@Novelocrat not as the named prototype but in trivial usage (accessing the data member, maybe calling other non-fill functions, had they existed) in the call_work_kernel definition . I want to know if this minimal usage and a standards compliant compiler would be required to instantiate this member. — Jason Newton, Oct 08 '15 at 02:14
Could you please edit to show examples of sort of accesses/calls in the definition of `call_work_kernel` you have in mind? — Phil Miller, Oct 08 '15 at 02:24
@novelocrat ok I've introduced more cuda into it to get you a typical example, basically assume fill is not going to be explicitly instantiated or called in the cu file. — Jason Newton, Oct 08 '15 at 03:56

templated class, nontemplate member: does the member need explicit instantiation in one translation unit to be seen in another w/ a fwd declaration

0 Answers0