5

When I run my code with TensorFlow directly, everything is normal.

However, when I run it in a screen window, I get the following error.

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

I have tried the command:

source /etc/profile

But it doesn't work.

Cause I use ssh to connect to the servers, the screen is necessary.

How can I fix it?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
dwqy11
  • 125
  • 1
  • 1
  • 8

5 Answers5

9

Steps to follow:
Find libcuda.so.1:

echo $LD_LIBRARY_PATH #path
sudo find /usr/ -name 'libcuda.so.*' #version

Then add to $LD_LIBRARY_PATH, in my case /usr/local/cuda-10.0/compat, with the following command, in terminal:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/compat
TkrA
  • 439
  • 1
  • 5
  • 19
5

Background

libcuda.so.1 is the library for interacting with the CUDA driver (as opposed to CUDA's "Runtime API", for which you need libcudart.so.*).

Now, it's quite possible to have the CUDA Toolkit properly installed, without the driver being properly installed. And this error could be the result of building a (non-statically-linked) CUDA application in this situation.

Alternatively, it could be the case that there's some misconfiguration of the library search path - because normally, libcuda.so.* are supposed to be installed in some directory on that path!

So, what's on that search path? As explained here, it is:

  1. directories from $LD_LIBRARY_PATH
  2. directories from /etc/ld.so.conf
  3. /lib
  4. /usr/lib

A typical scenario would be for /etc/ld.so.conf to add, say, /usr/lib/x86_64-linux-gnu; and for libcuda.so.* to be there.

Bottom line

Here's what you should do:

  1. Make sure a(n up-to-date) CUDA driver has been properly installed. If it hasn't, download and install it, problem solved.
  2. Locate the libcuda.so.1 file (e.g. using locate). If it's been placed somewhere weird that's not in the library search path - act as in step 1.
  3. If you wanted the driver library installed someplace weird, then add that path to your user's $LD_LIBRARY_PATH.
einpoklum
  • 118,144
  • 57
  • 340
  • 684
3

Try to put libcuda.so.1 path to LD_LIBRARY_PATH environment variable.

example:

export LD_LIBRARY_PATH=/path/of/libcuda.so.1:$LD_LIBRARY_PATH
Jethro Sandoval
  • 266
  • 1
  • 7
  • THX!But it doesn't work. I run your command and then update the profile with : ` source /etc/profile ` and ` source bash_profile `, they either don't work. Are there any other methods? – dwqy11 Jan 21 '19 at 08:39
  • THX! It worked. But the ` $LD_LIBRARY_PATH ` should be found manually. – dwqy11 Jan 21 '19 at 08:48
  • You can put the above code in ~/.bashrc so everytime you open up a terminal the code will execute and update the LD_LIBRARY_PATH. – Jethro Sandoval Feb 06 '19 at 04:58
0

As my condition, I develop in docker container environment, I do the following steps:

  1. Confirm your docker container have run with nvidia gpu
  2. Find libcuda.so.1: sudo find /usr/ -name 'libcuda.so.*'
  3. Then add to $LD_LIBRARY_PATH, in my case /usr/local/cuda-11.5/compat, with the following command, in terminal: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.5/compat
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 28 '23 at 08:52
0

If you're trying to run the job inside a container, try starting it with nvidia-docker run instead of docker run. Additional instructions can be found here: https://github.com/NVIDIA/nvidia-docker

Aaditya Singh
  • 931
  • 6
  • 4