6

I installed Cuda-8.0 and Tensorflow GPU version on ubuntu 16.04. It was working fine initally and using GPU. But suddenly it has stopped using GPU. I installed tensorflow through pip and correctly the GPU version as it worked and used GPU initially.

The message I get while importing tensorflow is:

>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/lib/x86_64-linux-gnu
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

So clearly it's even able to locate cuda library from LD_LIBRARY_PATH. But when I get following output:

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 375.39.0
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:

So it's not able to locate GPU. nvidia-smi gives following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Graphics Device     Off  | 0000:01:00.0      On |                  N/A |
| 23%   41C    P8    11W / 250W |    337MiB / 11169MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1005    G   /usr/lib/xorg/Xorg                             197MiB |
|    0      2032    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd    89MiB |
|    0     30355    G   compiz                                          37MiB |
+-----------------------------------------------------------------------------+

I browsed other links on stackoverflow, but they mostly ask to check LD_LIBRARY_PATH or nvidia-smi. For me both are expected, so not able to understand the issue.

EDIT: I tried installing cudnn 5 and putting it in LD_LIBRARY_PATH also, tensorflow reads it successfully but still the same error on creating session.

Naman
  • 2,569
  • 4
  • 27
  • 44
  • You need to install cuDNN library – Ivan Aksamentov - Drop Apr 19 '17 at 06:10
  • @Drop I installed and I know that my LD_LIBRARY_PATH is not pointing to it. But shouldn't this still run without that? I am sure it was running without that but somehow something screwed up later. – Naman Apr 19 '17 at 06:12
  • @Drothe pouintp also I don't know why it wants libcudnn.so.5 only and not 6. I have 6 installed already and I didn't want to downgrade. – Naman Apr 19 '17 at 06:16
  • What I see is the log saying that cuDNN is missing. It asks for version 5 because your distribution of TF was linked against this version. You may rebuild TF against v6 if you wish (not sure yet if it is supported though). Also check if any of [these](http://stackoverflow.com/questions/37660312/run-tensorflow-on-cpu/37660913#37660913) are enabled, preventing TF from seeing GPUs. – Ivan Aksamentov - Drop Apr 19 '17 at 17:01
  • Another strange thing is that nvidia-smi cannot resolve the name of the device (I see "Graphics Device" there). 250W and 12Gb, is it Titan X or Tesla? You may also want to check that the driver is installed correctly. – Ivan Aksamentov - Drop Apr 19 '17 at 17:05
  • @Drop hang on, I will check it as soon as I get to my home desktop (currently at work), but I really need this help. The last alternative would be to delete everything and reinstall. :( . The graphic card is gtx 1080 ti . Most surprisingly, everything worked when I first installed in one go but suddenly it has stopped working. Can't remember what I changed. – Naman Apr 20 '17 at 00:17
  • @Drop weirdly this got solved after a restart automatically. I can't see nay change. nvidia-smi output looks exactly the same. Porbably something related to PATH/ LD_LIBRARY_PATH got messed up. But I am scared that it should not happen again. – Naman Apr 20 '17 at 06:04

1 Answers1

1

Simply rename "cudnn64_6.dll" to "cudnn64_5.dll".

Smakosh
  • 1,034
  • 12
  • 13
  • How/why would this fix the problem? Can you please expand your answer a bit into something more useful? – Cody Gray - on strike Aug 02 '17 at 17:49
  • when u downloaded the cudnn 6.0 zip file, u found a file named "cudnn64_6.dll" inside the bin folder right ? rename that to "cudnn64_5.dll" and everything should work if u've actually installed tensorflow-gpu version – Smakosh Aug 02 '17 at 22:54