0

I am a beginner when it comes to executing the python code on GPU. I have a CNN code which I would like to run on GPU. I have tensorflow-gpu, CUDA and CUDANN installed on my laptop, but the Python code doesn't execute on GPU.

nvidia-smi

I will just write here everything that I tried and post the output

  1. Code:

    pip freeze | grep tensorflow
    

    Output:

    tensorflow==2.0.0
    tensorflow-estimator==2.0.0
    tensorflow-gpu==2.0.0
    
  2. Code:

    nvcc --version
    

    Output:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Fri_Feb__8_19:08:17_PST_2019
    Cuda compilation tools, release 10.1, V10.1.105
    
  3. Code

    cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
    

    Output

    define CUDNN_MAJOR 7
    define CUDNN_MINOR 5
    define CUDNN_PATCHLEVEL 0
    define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
    include "driver_types.h"
    
  4. Code:

    from __future__ import absolute_import, division, print_function, unicode_literals
    import tensorFlow as tf
    
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
    

    Output:

    Num GPUs Available:  0
    
  5. Code

    import tensorflow
    from tensorflow.python.client import device_lib
    print(device_lib.list_local_devices())
    

    Output:

    2019-10-16 22:11:15.280922: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2019-10-16 22:11:15.484734: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
    2019-10-16 22:11:15.508127: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d4c60 executing computations on platform Host. Devices:
    2019-10-16 22:11:15.508212: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
    2019-10-16 22:11:15.784006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-10-16 22:11:15.785226: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d6ad0 executing computations on platform CUDA. Devices:
    2019-10-16 22:11:15.785278: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1
    2019-10-16 22:11:15.785605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-10-16 22:11:15.786528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
    name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
    pciBusID: 0000:01:00.0
    2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.788010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2019-10-16 22:11:15.788036: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
    Skipping registering GPU devices...
    2019-10-16 22:11:15.788073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] 
    Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-10-16 22:11:15.788094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
    2019-10-16 22:11:15.788111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
    [name: "/device:CPU:0"
    device_type: "CPU"
    memory_limit: 268435456
    locality {
    }
    incarnation: 7400412130462543104
    ,name: "/device:XLA_CPU:0"
    
    device_type: "XLA_CPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 10419596086097903998
    physical_device_desc: "device: XLA_CPU device"
    ,name: "/device:XLA_GPU:0"
    device_type: "XLA_GPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 10970348491339008844
    physical_device_desc: "device: XLA_GPU device"
    ]
    

I have referred to several websites which basically says that if you have GPU and tensorflow-gpu installed then the program will automatically detect the GPU and run the code. I also know that there are similar questions on StackOverflow, and the above code is implemented after finding answers to similar question. The official website of tensorflow 2.0

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Output is:

RuntimeError: Device placement logging must be set at program startup

Why is my program not executing on gpu?

halfer
  • 19,824
  • 17
  • 99
  • 186
Rao208
  • 147
  • 1
  • 3
  • 15
  • See [this answer](https://stackoverflow.com/questions/58053509/why-does-my-keras-model-train-after-i-load-it-even-though-i-have-not-actually-s/58160270#58160270); order of installation matters, and so do the versions of CUDA & cuDNN in relation to TensorFlow and operating system. May or may not solve your problem. – OverLordGoldDragon Oct 16 '19 at 20:43
  • Agree with @OverLordGoldDragon. Try removing every TF installation, than installing recommended versions of CUDA as mentioned on TF install page and than running tensorflow. Or you may try conda installation for a change - it might help you handle cuda installs, but be sure to update your nvidia driver before that. – y.selivonchyk Oct 17 '19 at 03:00

3 Answers3

3

If you look here-

2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/

It says, it is looking for files with Cuda 10.0 however, what it found are Cuda 10.1 files. So, first step would be to uninstall and remove Cuda 10.1 version and install Cuda 10.0. Also remove tensorflow, and just keep tensorflow-gpu. For all the other versions follow the exact suggestions here.

Let us know if that solves your issue.

Rishabh Sahrawat
  • 2,437
  • 1
  • 15
  • 32
  • Hi. I uninstalled CUDA 10.1, but when I type nvidia-smi in the terminal, it shows that I have CUDA 10.1. Is this normal behaviour? – Rao208 Oct 21 '19 at 17:49
  • **When I type**: ldconfig -p | grep cuda **Output is this**: libicudata.so.60 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libicudata.so.60 libicudata.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libicudata.so libcudart.so.9.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudart.so.9.1 libcuda.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so.1 libcuda.so.1 (libc6) => /usr/lib/i386-linux-gnu/libcuda.so.1 libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so libcuda.so (libc6) => /usr/lib/i386-linux-gnu/libcuda.so Based on this output, is CUDA installed? – Rao208 Oct 21 '19 at 17:49
  • Yes that is fine. It is also like this on my machine maybe because we both installed Cuda10.1 first and then had to go back to 10.0 which somehow didn’t change some file for ‘nvidia -smi’ . You can confirm your Cuda version in other ways like by, ‘nvcc —version’. For more exhaustive list read [here](https://arnon.dk/check-cuda-installed/). Please accept my answer if it has solved your problem. :) – Rishabh Sahrawat Oct 21 '19 at 17:56
  • Great! Thanks :) nvcc --version is giving me the output **command 'nvcc' not found, but can be installed with: sudo apt install nvidia-cuda-toolkit** So, well CUDA 10.1 is not installed. – Rao208 Oct 21 '19 at 17:59
  • That might be a path file problem. Did you set the path for CuDNN? – Rishabh Sahrawat Oct 21 '19 at 18:02
  • Edit: I opened the bashrc and found the following command: export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" export CUDA_HOME=/usr/local/cuda export PATH="/usr/local/cuda/bin:$PATH" # Virtual Environment Wrapper export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64/ Should I remove the above command? I am just following the instruction from the link that you provided and what you said (i.e. to uninstall the cuda 10.1) I haven't installed CUDA 10.0 yet. – Rao208 Oct 21 '19 at 18:10
  • Ah okay. Ja then the error was reasonable haha. Install 10.0 and check – Rishabh Sahrawat Oct 21 '19 at 18:11
  • Thank you @Rishabh Sahrawat. – Rao208 Oct 23 '19 at 12:24
0

Rishabh Sahrawat's answer worked for me. It took me a very long time to figure out how to uninstall CUDA 10.1 and install CUDA 10.0. While this is pretty informative, I was still struggling to get all the installation's right as I was getting package error (sigh), NVIDIA driver error, dpkg error, etc. I thought it would be nice to gather everything in one place and guide others (beginner's like me) who are probably facing the same difficulties. I tried the following command to fix the error and it worked for me. Some of them are already mentioned in the question, but nevertheless I have mentioned it here too. I hope this helps.

1. How to uninstall CUDA?

dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 sudo dpkg --purge --force-all
sudo apt-get remove cuda-*

2. How to check if CUDA is uninstalled/ installed?

Command:

nvcc --version

Output (if uninstalled)

command 'nvcc' not found, but can be installed with sudo apt install nvidia-cuda-toolkit

Output (if installed)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

3. In case of the error bash: /usr/bin/nvcc: No such file or directory

Check the path in .bashrc. One can also refer to this link

4. How to remove NVIDIA driver old version?

Command

sudo apt-get --purge remove "*nvidia*"

5. How to check if the driver is installed?

Command

nvidia-smi

6. In the case of Error message “Sub-process /usr/bin/dpkg returned an error code (1)”

dpkg error

One can also try:

sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev
apt --fix-broken install # (if it doesn't work, try it in root)

7. How to install CUDA?

I used the following command instead of step 4 in CUDA installation

sudo apt-get install cuda-10-0

8. How to install CUDANN?

Download cuDNN Library for Linux

# Unpack the archive

tar -zxvf cudnn-10.0-linux-x64-v7.6.4.38.tgz

# Move the unpacked contents to your CUDA directory

sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64/
sudo cp  cuda/include/cudnn.h /usr/local/cuda-10.0/include/

# Give read access to all users

sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

One can also use the following links (it did not work for me, but they are worth trying):

  1. I ended up installing CUDA 10.1 by following the steps in the link.
  2. I could not create a new file, /etc/profile.d/cuda.sh as suggested in this link
  3. This link is good too.

Once everything is installed, and tensorflow is uninstalled (just keep tensorflow-gpu), the code will run on GPU

How to ensure tensorflow is using the GPU

Note: if you face an import error while importing tensorflow, I did this and it worked for me

pip uninstall tensorflow
pip uninstall tensorflow-gpu

pip install tensorflow-gpu

Additional information:

1. To check Ubuntu kernel version:

uname -sr
uname -r
uname -a

2. To install the GCC

Enjoy :)

Rao208
  • 147
  • 1
  • 3
  • 15
0

If any of the above doesn't work, try installing tensorflow-gpu with conda instead of pip. For some reason pip install tensorflow-gpu doesn't work as expected.

conda install tensorflow-gpu
mallet
  • 2,454
  • 3
  • 37
  • 64