How to link host code with a static CUDA library after separable compilation?

Question

Alright, I have a really troubling CUDA 5.0 question about how to link things properly. I'd be really grateful for any assistance!

Using the separable compilation features of CUDA 5.0, I generated a static library (*.a). This nicely links with other *.cu files when run through nvcc, I have done this many times.

I'd now like to take a *.cpp file and link it against the host code in this static library using g++ or whatever, but not nvcc. If I attempt this, I get compiler errors like

undefined reference to __cudaRegisterLinkedBinary

I'm using both -lcuda and -lcudart and, to my knowledge, have the libraries in the correct order (meaning -lmylib -lcuda -lcudart). I don't think it is an issue with that. Maybe I'm wrong, but I feel I'm missing a step and that I need to do something else to my static library (device linking?) before I can use it with g++.

Have I missed something crucial? Is this even possible?

Bonus question: I want the end result to be a dynamic library. How can I achieve this?

I've already tried that, it didn't do anything. The actual command I am using is: g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro mycpplib.o mycudalib.a -L/usr/local/cuda-5.0/lib64 -L/usr/local/cuda-5.0/lib -lmystaticlib -lcuda -lcudart -lcudadevrt -o mylinkedlib.so - I am trying to create a Python module for my CUDA library. — user2333829, Apr 30 '13 at 09:06
The error about __cudaRegisterLinkedBinary being undefined actually occurs when I try to import the Python module. g++ does actually compile everything without complaining. — user2333829, Apr 30 '13 at 09:14
You need to use `nvcc` (or `nvlink`) to link, not `g++`. `g++` doesn't know how to link together device objects. — Jared Hoberock, Apr 30 '13 at 15:46
Is there a way to take my static library as a *.a file and device link it with nvcc, then pass whatever the output of that is to g++ to link with the host code? My cpp file contains no CUDA code -- is device linking the right thing here? — user2333829, Apr 30 '13 at 17:24
Have a look at this: http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#using-separate-compilation-in-cuda — Jared Hoberock, Apr 30 '13 at 18:23
Jared's claim that you can't use g++ to link is not really correct, as the doc explains. See my answer below. — harrism, Apr 30 '13 at 23:02
-lcudadevrt is only necessary if you are using CUDA Dynamic Parallelism. — harrism, Apr 30 '13 at 23:03
If these solutions are not working, I've found that setting CUDA_RESOLVE_DEVICE_SYMBOLS ON (if using CMake) fixes the issue. — JAustin, Jul 13 '18 at 20:46

harrism · Accepted Answer · 2013-06-03T06:18:11.707

When you link with nvcc, it does an implicit device link along with the host link. If you use the host compiler to link (like with g++), then you need to add an explicit step to do a device link with the –dlink option, e.g.

nvcc –arch=sm_35 –dc a.cu b.cu
nvcc –arch=sm_35 –dlink a.o b.o –o dlink.o
g++ a.o b.o dlink.o x.cpp –lcudart

There is an example of exactly this in the Using Separate Compilation chapter of the nvcc doc.

Currently we only support static libraries for relocatable device code. We’d be interested in learning how you would want to use such code in a dynamic library. Please feel free to answer in the comments.

Edit:

To answer the question in the comment below " Is there any way to use nvcc to turn mylib.a into something that can be put into g++?"

Just use the library like an object, like this:

nvcc –arch=sm_35 –dlink mylib.a –o dlink.o
g++ mylib.a dlink.o x.cpp –lcudart

score 0 · Answer 2 · answered May 02 '13 at 20:41

0

You can use libraries anywhere you use objects. So just do nvcc –arch=sm_35 –dlink mylib.a –o dlink.o g++ mylib.a dlink.o x.cpp –lcudart

answered May 02 '13 at 20:41

Mike Murphy

1

1

Thanks very much for your suggestion, Mike, I appreciate it. But, I have strange behaviour when I try exactly the nvcc command you proposed. Yes, the nvcc command runs and doesn't complain. However, when I try and put the new object file through g++, it seems that all of my functions are undefined. A quick inspection of file size shows that the original mylib.a is 988K, whereas the object after device linking is only 56K. That can't be right, any idea what's up? (Thanks again!) – user2333829 May 07 '13 at 15:45

How to link host code with a static CUDA library after separable compilation?

2 Answers2

Linked