I have a library with some __host__ __device__
functions. I also have an #ifdef __CUDACC__
gadget which makes sure that a regular C++ compiler doesn't see the __host__ __device__
and can thus compile those functions.
Now, I want to use the compiled host-side version of my library's function in a plain-vanilla C++ static library file (.a
on Linux) - and I would even like that library to be compilable when CUDA is unavailable; and I want the compiled device-side versions in a separate static library.
I am almost there (I think), but am stuck with a linking error. Here are toy sources for such a library, a test program (which calls both the device-side and the host-side version of a function) and the build commands I use.
What am I getting wrong?
my_lib.hpp
(Library header):
#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y);
int bar();
my_lib.cu
(Library source):
#include "my_lib.hpp"
#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y) { *x = *y; }
int bar() { return 5; }
main.cu
(test program):
#include "my_lib.hpp"
__global__ void my_kernel() {
int z { 78 };
int w { 90 };
foo(&z,&w);
}
int main() {
int z { 123 };
int w { 456 };
foo(&z,&w);
my_kernel<<<1,1>>>();
cudaDeviceSynchronize();
cudaDeviceReset();
}
My build commands:
c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar qc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -dc -o my_lib-cuda.o my_lib.cu
ar qc my_lib-cuda.a my_lib-cuda.o
ranlib my_lib-cuda.a
nvcc -dc -o main.rdc.o main.cu
nvcc -dlink -o main.o main.rdc.o my_lib-cuda.a
c++ -o main main.o my_lib-noncuda.a -lcudart
And the errors I get - on the last, linking, command:
/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416':
link.stub:(.text+0x5a): undefined reference to `__fatbinwrap_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416'
/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6':
link.stub:(.text+0xaa): undefined reference to `__fatbinwrap_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6'
collect2: error: ld returned 1 exit status
Notes:
- I use CUDA 10.1 and g++ 9.2.1 on Devuan GNU/Linux.
- This is a "follow-up" to a deleted question; @talonmies commented I had better show exactly what I did; and that changed the question somewhat.
- Somewhat-related question: this one.