5

I have an application that utilizes an object detection model built with Keras and TensorFlow. I am sure that I have TensorFlow-GPU installed. When the application runs, I don't see my GPU being utilized as expected. So based on this post/answer I've tried to verify that my GPU can be utilized by TensorFlow, but it gives an error, indicating that CUDA isn't enabled for my GPU (i.e. The requested device appears to be a GPU, but CUDA is not enabled.):

$ python 
Python 3.7.4 (default, Jul  9 2019, 15:11:16) 
[GCC 7.4.0] on linux
>>> import tensorflow as tf
>>> with tf.device('/gpu:0'):
...     a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
...     b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
...     c = tf.matmul(a, b)
... 
>>> with tf.Session() as sess:
...     print(sess.run(c))
... 
2019-07-24 20:31:37.175391: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-24 20:31:37.204704: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-07-24 20:31:37.207126: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560dd3d22490 executing computations on platform Host. Devices:
2019-07-24 20:31:37.207196: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
  File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
    self._extend_graph()
  File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
     [[MatMul]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/james/.virtualenvs/deep_monitor_venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <stdin>:4) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
     [[MatMul]]

Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
 a (defined at <stdin>:2)   
 b (defined at <stdin>:3)
>>> 

However, from what I can tell the CUDA toolkit is installed correctly and CUDA is enabled on the GPU:

$ ll /usr/local/cuda
lrwxrwxrwx 1 root root 9 Jun 12 15:59 /usr/local/cuda -> cuda-10.1/

$ nvidia-smi
Wed Jul 24 16:14:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   43C    P0    N/A /  N/A |    830MiB /  4042MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+

My virtual environment's installed packages:

$ pip list
Package              Version 
-------------------- --------
absl-py              0.7.1   
arrow                0.14.2  
astor                0.8.0   
Cython               0.29.12 
gast                 0.2.2   
google-pasta         0.1.7   
grpcio               1.22.0  
h5py                 2.9.0   
imutils              0.5.2   
joblib               0.13.2  
Keras                2.2.4   
Keras-Applications   1.0.8   
Keras-Preprocessing  1.1.0   
keras-resnet         0.2.0   
keras-retinanet      0.5.1   
Markdown             3.1.1   
numpy                1.16.4  
opencv-python        4.1.0.25
Pillow               6.1.0   
pip                  19.2.1  
progressbar2         3.42.0  
protobuf             3.9.0   
python-dateutil      2.8.0   
python-utils         2.3.0   
PyYAML               5.1.1   
scikit-learn         0.21.2  
scipy                1.3.0   
setuptools           41.0.1  
six                  1.12.0  
SQLAlchemy           1.3.6   
SQLAlchemy-Utils     0.34.1  
tensorboard          1.14.0  
tensorflow           1.14.0  
tensorflow-estimator 1.14.0  
tensorflow-gpu       1.14.0  
termcolor            1.1.0   
Werkzeug             0.15.5  
wget                 3.2     
wheel                0.33.4  
wrapt                1.11.2

My operating system is Ubuntu 18.04.2 LTS.

Joe Wood
  • 43
  • 10
James Adams
  • 8,448
  • 21
  • 89
  • 148
  • 2
    Your CUDA version is 10.1. Yes, it matters. Any TF is built against a specific CUDA version and requires that CUDA version, to use the GPU. You appear to have tensorflow-gpu 1.14.0 which seems to require CUDA 10.0. CUDA 10.1 cannot be used as a replacement for CUDA 10.0. Your question may very well be a duplicate of [this one](https://stackoverflow.com/questions/56786677/tensorflow-1-14-0-is-not-using-gpu). – Robert Crovella Jul 25 '19 at 01:02
  • Thank you, Robert. I will downgrade CUDA to version 10.0. I appreciate your help. – James Adams Jul 25 '19 at 13:10
  • I still get the same error when I try the simple tensor operation described above after I downgraded to CUDA version 10.0 as instructed here: https://www.tensorflow.org/install/gpu – James Adams Jul 25 '19 at 13:48
  • 1
    What's interesting is that the output of `nvidia-smi` still shows CUDA version 10.1. Perhaps I need to somehow reset my GPU so it recognizes the installed version of CUDA? – James Adams Jul 25 '19 at 14:06
  • You have not installed CUDA anywhere that the anaconda environment can find . Try installing the cudatoolkit and cudnn with conda – talonmies Jul 25 '19 at 16:06
  • Thank you @talonmies. It seems that I'm past this now after modifying my `PATH` to include `/usr/local/cuda-10.0/bin` instead of `/usr/local/cuda-10.1/bin`. After I did that I created a conda environment and a vanilla Python virtual environment. For both I only installed `tensorflow-gpu` then ran the simple tensor operation above and it worked as expected for both environments. Is it likely that the PATH update is corrective here, or is it perhaps something else I've done as I've bumbled my way through this (various cuda/nvidia reinstalls using `apt`, etc.)? There's probably no way to tell... – James Adams Jul 25 '19 at 17:30
  • What I actually changed was my `CUDA_HOME`, and this is used in both the `PATH` and `LD_LIBRARY_PATH` environment variables, and I'm now thinking it's the latter of these which might actually be at play here. – James Adams Jul 25 '19 at 19:20
  • 2
    If the Tensorflow and CUDA/cuDNN are in different PATH, Tensorflow-gpu won't recognize CUDA installation. Both should be in same environment or PATH. For more details refer [this](https://www.tensorflow.org/install/gpu#linux_setup). Thanks! –  Jan 21 '21 at 05:08

0 Answers0