5

I installed Tensorflow 1.6.0 - GPU version with anaconda in a Python 3.6.4 environment.

When I do import tensorflow as tf, I get the following error:

ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory

The different versions:

  • cudnn : 7.1.1
  • cuda : 9.0.176
  • tensorflow : 1.6.0
  • Ubuntu : 16.04

I am aware of this but it did not solve my problem.

talonmies
  • 70,661
  • 34
  • 192
  • 269
vvvvv
  • 25,404
  • 19
  • 49
  • 81

4 Answers4

7

The accepted answer is wrong (installing nvidia-cuda-toolkit). By installing the toolkit you are basically installing a second CUDA on top of already installed cuda from the nvidia guide.

The problem turned out to be an issue with symbolic links. Inspiration is from this topic http://queirozf.com/entries/installing-cuda-tk-and-tensorflow-on-a-clean-ubuntu-16-04-install but the actual resolution is different

So at one point during CuDNN installation nvidia tutorial will ask you to do this:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

The problem with this approach is that copying files with filter libcudnn* will break the symbolic links of the copied files. Instead, I suggest runnign following command, but it will still break the links:

sudo cp --preserve=links cuda/lib64/libcudnn* /usr/local/cuda/lib64

You can verify the links by running ls -lha libcudnn* in /usr/local/cuda/lib64 folder. If you happen to not see a picture like this:

lrwxrwxrwx 1 root root 13 May 2 20:02 libcudnn.so -> libcudnn.so.7

lrwxrwxrwx 1 root root 17 May 2 20:02 libcudnn.so.7 -> libcudnn.so.7.6.5

-rwxr-xr-x 1 root root 409M May 2 20:02 libcudnn.so.7.6.5

-rw-r--r-- 1 root root 386M May 2 20:02 libcudnn_static.a

Then you just found the problem. The actual solution is involving doing the following:

sudo rm /usr/local/cuda/lib64/libcudnn.so
sudo rm /usr/local/cuda/lib64/libcudnn.so.7
cd /usr/local/cuda/lib64/
sudo ln -s libcudnn.so.7.6.5 libcudnn.so.7
sudo ln -s libcudnn.so.7 libcudnn.so

Remove the old "links" and create new ones. Verify the links again with ls -lha libcudnn*. After that run following command in verbose mode:

sudo ldconfig -v

CHECK the logs. I don't know exactly what it does, but it turned out that it is something very important. Also, if the log says that symbolic link is broken or something along these lines then the tensorflow will continue to show the error mentioned in the subject.

BONUS! make sure you have following paths appended as the last lines nano ~/.bashrc

export PATH=/usr/local/cuda/bin:/opt/nvidia/nsight-compute/2019.4.0${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDADIR=/usr/local/cuda${CUDADIR:+:${CUDADIR}}
export CUDA_HOME=/usr/local/cuda

and then run the command source ~/.bashrc

All of the above steps assume that you did NOT use the nvidia-cuda-toolkit, but instead used nvidia cuda repo.

Also when installing CUDA make sure you are not targeting the 10.2. On the momenent of writing TF supports versions up to Cuda 10.1, so following is the right way of installing the necessary version:

sudo apt-cache policy cuda
sudo apt-get install cuda=10.1.243-1

Verifications by:

nvcc --version
nvidia-smi

EDIT: I found the error that you should AVOID seeing after running the ldconfig command:

/usr/local/cuda-10.1/targets/x86_64-linux/lib:

...

libnppist.so.10 -> libnppist.so.10.2.0.243

libcuinj64.so.10.1 -> libcuinj64.so.10.1.243

> /sbin/ldconfig.real: /usr/local/cuda-10.1/targets/x86_64-linux/lib /libcudnn.so.7 is not a symbolic link

libcudnn.so.7 -> libcudnn.so.7.6.5

libnppc.so.10 -> libnppc.so.10.2.0.243

libnppicom.so.10 -> libnppicom.so.10.2.0.243

libnvgraph.so.10 -> libnvgraph.so.10.1.243

/usr/lib/x86_64-linux-gnu/libfakeroot:

...

If you see it, then something is still misconfigured.

Community
  • 1
  • 1
Alex
  • 4,607
  • 9
  • 61
  • 99
1

I don't have enough reputations to comment on Alex's answer. But now on Ubuntu 20.04 the paths have been changed! Also, no need to --preserve=links now when doing cp! So I should probably post a new answer:

Install cuDNN library 7.6 for TensorFlow 2.3.1 with CUDA 10.1, in an environment created by conda create --name tfgpu10.1 python=3.8:

  1. Go to https://developer.nvidia.com/cuDNN
  2. Download "cuDNN Library for Linux" in "Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1"
  3. Extract using tar -xvzf cudnn-10.1-linux-x64-v7.6.5.32.tgz
  4. "Install" files:
    sudo cp cuda/include/cudnn.h /usr/lib/cuda/include/
    sudo cp cuda/lib64/libcudnn* /usr/lib/cuda/lib64/
    
  5. Set permission:
    sudo chmod a+r /usr/lib/cuda/include/cudnn.h /usr/lib/cuda/lib64/libcudnn*
    

Output of testing:

Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-12-02 03:58:41.089993: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> tf.config.list_physical_devices("GPU")
2020-12-02 03:58:48.538295: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-02 03:58:48.587523: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-02 03:58:48.587838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s
2020-12-02 03:58:48.587860: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-12-02 03:58:48.589111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-12-02 03:58:48.590284: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-12-02 03:58:48.590488: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-12-02 03:58:48.591785: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-12-02 03:58:48.592520: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-12-02 03:58:48.595129: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-12-02 03:58:48.595213: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-02 03:58:48.595555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-02 03:58:48.595815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Mark
  • 336
  • 1
  • 7
0

I installed the nvidia-cuda-toolkit package:

$ sudo apt install nvidia-cuda-toolkit

and it worked.

I did not find the solution nor on the tensorflow website nor on the nvidia installation page. I found it by luck while looking for a way to get the cuda version with a command line: How to get the cuda version?

vvvvv
  • 25,404
  • 19
  • 49
  • 81
-5

This didn't work for me, In my case it was because I had multiple versions of Cuda installed and that the cudnn version I had was for an older version than the one I was trying to use so I installed the cudnn for the new verision following nvidia's instructions and that did it for me.

Xavi
  • 1