What should I link against: The actual CUDA driver library or the driver library stub?

Question

A CUDA distribution, at least on Linux, has a "stub libraries" directory, which contains among others a libcuda.so file - named the same as an actual NVIDIA driver library.

When build a CUDA program which makes driver API calls, on a system with both CUDA and the CUDA driver installed - when should I be linking against the driver library stub, and when should I link against the actual driver library?

Notes:

In case it matters, assume a recent CUDA release (version 11.x or 12.x).
The NVIDIA driver might be different than the driver version bundled with the CUDA distribution, i.e. that may be one of the factors.
If we're using a build system generator, e.g. CMake, this question is basically moot, since you let it decide where to locate the relevant libraries - and its choice works. See this question about doing that.

If the version of `libcuda.so` and the stub are the same, it should not matter. The stub is provided primarily so that people who are building codes on a machine without a GPU can still build using the driver API, which requires linking against `libcuda.so`. The stub is there so that if you do not have a GPU and therefore no GPU driver installed, you can still link. In a production machine, the driver install is what provides the `libcuda.so` used for production. — Robert Crovella, Aug 27 '23 at 21:40
@RobertCrovella: Well, they might be the same, but they're not necessarily the same, since the installed driver version may not be the one bundled with the CUDA version. — einpoklum, Aug 28 '23 at 09:03
That is certainly true. I was responding only for the case where they are the same. When they are not the same, in my view the question converts to one of CUDA versioning. For the case where stub version is newer than your driver install, and you wish to run on that machine, your machine install is broken (error: unsupported ptx version). For the case where stub version is older than your driver install, it will depend on exactly what your objectives are. In the absence of any narrowing of scope, a complete answer would have to delve into all the nooks and crannies of cuda compatibility. — Robert Crovella, Aug 28 '23 at 14:01

talonmies · Answer 1 · 2023-08-28T06:58:51.053

I would always counsel to link against the stub library rather than the local libcuda, where one exists. I say that for exactly three reasons:

It’s portable. If you provision any system with the toolkit, you have the stub library which matches the toolkit version you have, whether it is on a machine with GPU hardware or not.
It’s in a predictable location. If you know where the toolkit is, you (or just about any build system) can derive a relative path that will get you to the stub. That isn’t the case for the local libcuda library, particularly on systems which provide third party repackaged drivers, where things can be in very non-standard places. I have seen this wreck havoc on automatic packaging and distribution systems.
It is what nvcc does. When in Rome….

+1, but mostly because of reason (3.) : Because also portable to check which of the two exists and apply the same logic everywhere regarding which to prefer, without making assumptions. So I would rephrase (1. + 2.) to be: "I's easier to do this predictably and portably." — einpoklum, Aug 28 '23 at 09:05

What should I link against: The actual CUDA driver library or the driver library stub?

1 Answers1