1

If I try to run a command that tells you whether Tensorflow using GPU or not.

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Command returns this, now I am not able to understand whether tensor flow using GPU or not.

2019-09-25 17:08:47.509729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:03:00.0
2019-09-25 17:08:47.509929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510328: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510440: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-09-25 17:08:47.510483: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-25 17:08:47.510498: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-09-25 17:08:47.510524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-25 17:08:47.510536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-25 17:08:47.510556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
2019-09-25 17:08:47.510713: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

nvidia-smi command return this..

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.26                 Driver Version: 387.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:03:00.0  On |                  N/A |
| 46%   40C    P8    11W / 105W |   1240MiB /  8111MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1943      G   /usr/libexec/Xorg                            317MiB |
|    0      2047      G   /usr/bin/gnome-shell                           5MiB |
|    0      5505      G   /usr/libexec/Xorg                            109MiB |
|    0     10091      G   /usr/libexec/Xorg                            165MiB |
|    0     10952      G   ...uest-channel-token=14061294616847102337    87MiB |
+-----------------------------------------------------------------------------+
jax
  • 3,927
  • 7
  • 41
  • 70

1 Answers1

0

tensorflow has multiple stages before running a learning kernel(function) on GPU

  1. check available devices physically
  2. check library files relavent to them (CuDNN files and etc)
  3. allocate needed memory
  4. start the process

yours stuck in stage 2

running tensorflow with GPU needs couple of things

  1. Nvidia driver
  2. Cuda (compiler)
  3. CuDNN library files

how to test them

  1. run Nvidia-smi to test driver availability
  2. run nvcc --version to check cuda compiler availability
  3. run import tensorflow as tf
  4. run a session in tensorflow

if your getting error in stage 3 its because that tensorflow didn't find the CuDNN files

if your getting error in stage 4 its because you have version incompatability, you can check this solution inside stackoverflow, or search the problem

how to know when tensorflow is not using GPU

there are several ways to know:

  • the log like the one inside yours: Cannot dlopen some GPU libraries. Skipping registering GPU devices...

  • it can be detected from nvidia-smi command: if your using python, it will show a python process inside processes with very large memory consumption (it will consume almost all of your available GPU memory)

a-sam
  • 481
  • 3
  • 8