Runtime cudaErrorInsufficientDriver error from cudaGetDeviceCount when compiling with nvcc, icpc

Question

PROBLEM

I have an FFT-based application that uses FFTW3. I am working on porting the application to a CUDA-based implementation using CUFFT. Compiling and running the FFT core of the application standalone within Nsight works fine. I have moved from there to integrating the device code into my application.

When I run using with the CUFFT core code integrated into my application, cudaGetDeviceCount returns a cudaErrorInsufficientDriver error, although I did not get it with the Nsight standalone run. This call is made at the beginning of the run when I'm initializing the GPU.

BACKGROUND

I am running on CentOS 6, using CUDA 7.0 on a GeForce GTX 750, and icpc 12.1.5. I have also successfully tested a small example using a GT 610. Both cards work in Nsight (and I've also compiled and run command-line without problems, though not as extensively as from within Nsight).

To integrate the CUFFT implementation of the FFT core into my application, I compiled and device-linked with nvcc and then used icpc (the Intel C++ Compiler) to compile the host code and to link the device and host code to create a .so. I finally completed that step without errors or warnings (relying on this tutorial).

(The reasoning as to why I'm using a .so has a fair amount of history and additional background. Suffice it to say that making a .so is required for my application.)

The tutorial points out that compilation steps are different between generating the standalone executable (as I do in Nsight) and generating a device-linked library for inclusion in a .so. To get through the compilation, I had to add -lcudart as described in the tutorial, as well as -lcuda, to my icpc linking call (as well as the -L to add .../cuda-7.0/lib64 and .../cuda-7.0/lib64/stubs as the paths to those libraries).

NOTE: nvcc links in libcudart by default. I'm assuming it does the same for libcuda since Nsight doesn't include either of these libraries in any of the compile and linking steps.. As an aside, I do find it strange that although nvcc links them in by default, they don't show up from a call to ldd on the executable.

I also had to add --compiler-options '-fPIC' to my nvcc commands to avoid errors described here.

I have seen some chatter (for one example, see this post) about Intel/NVCC compatibilities, but it looks like they arise at compile-time with older versions of NVCC, so...I think I'm ok on that account.

Finally, here are the compile commands for compilation of three .cu files (all are identical except for the name of the .cu file and the name of the .o file):

nvcc
-ccbin g++
-Iinc
-I/path/to/cuda/samples/common/inc
-m64
-O3
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_52,code=compute_52
--relocatable-device-code=true
--compile
--compiler-options '-fPIC'
-o my_object_file1.o
-c my_source_code_file1.cu

And here are the flags I pass to the device linking step:

nvcc
-ccbin g++
-Iinc
-I/path/to/cuda/samples/common/inc
-m64
-O3
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_52,code=compute_52
--compiler-options '-fPIC'
--device-link
my_object_file1.o
my_object_file2.o
my_object_file3.o
-o my_device_linked_object_file.o

I probably don't need the -gencode flags for 30, 37, and 52, at least currently, but they shouldn't cause any problems, and eventually, I will likely compile that way.

And here are my compiling flags (minus the -o flag, and all my -I flags) that I use for the .cc file that uses calls my CUDA library:

-c
-fpic
-D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64
-fno-operator-names
-D_REENTRANT
-D_POSIX_PTHREAD_SEMANTICS
-DM2KLITE -DGCC_
-std=gnu++98
-O2
-fp-model source
-gcc
-wd1881
-vec-report0

Finally, here are my linking flags:

-pthread
-shared

Any ideas on how to fix this problem?

Are you compiling and linking with the intel compiler in g++ compatibility mode? — talonmies, Sep 30 '15 at 20:50
@talonmies I was under the impression that Intel compiles g++ compatibility mode by default, and you would have to explicitly shut it off using the --no-gcc flag. Having said that, I will update my post to show my compiling and linking flags. — MrMas, Sep 30 '15 at 23:26
Based on all cases I have ever encountered, "insufficient driver" means your driver is too old, relative to the version of the CUDA runtime being used. Update your driver package to the latest available for your platform. Since modern CUDA versions come with a matching driver package included, it is not clear how you wound up with an out-of-date driver. — njuffa, Oct 01 '15 at 20:06
@njuffa, as I stated in my post, the problem doesn't truly appear to be a driver problem since I can get my application to run from Nsight. Having said that, there is probably a way for the wrong driver or wrong runtime library to be loaded, creating a mismatch. Based on the results from ldd, I don't think this is the problem, but any ideas outside of using ldd to check this would be appreciated. — MrMas, Oct 01 '15 at 20:10
Instead of hypothesizing further, my suggestion is to simply install the latest driver package (that is only a few minutes of work) to check if that fixes the problem. I have never seen a false positive "insufficient driver" error being emitted by the CUDA runtime, which doesn't mean such a bug is impossible, it just seems very unlikely. — njuffa, Oct 01 '15 at 20:14
@njuffa, point taken. I did update the driver yesterday but failed to update the runtime library so it broke everything. I'm reliant on IT doing it so I'm not sure it was done correctly. — MrMas, Oct 01 '15 at 20:16
This does not add up in my mind. An older CUDA runtime should work just fine with a newer CUDA driver, it's the reverse scenario that does not work and gives rise to the "insufficient driver" error. In other words, you should be able to install never drivers without updating the CUDA runtime. You may want to check with your IT department to see what exactly it is they are doing. — njuffa, Oct 01 '15 at 20:19
@njuffa, I did update the driver. though I'm certain now that the problem is not the driver, but due to a bad LD_LIBRARY_PATH. See my answer below. Thanks very much for your time. Your response ultimately led me down the correct path. If you were to put an answer down, I'll select it as the correct one. — MrMas, Oct 01 '15 at 23:28

score 1 · Accepted Answer · edited May 23 '17 at 11:51

1

Don't add to LD_LIBRARY_PATH .../cuda7.0/lib64/stubs. If you do, you will pick up libcuda.so from there instead of from the driver. (See this post).

edited May 23 '17 at 11:51

Community

1
1

answered Oct 01 '15 at 23:33

MrMas

1,143
2
14
28

Runtime cudaErrorInsufficientDriver error from cudaGetDeviceCount when compiling with nvcc, icpc

1 Answers1