0

(Similar to this thread but not quite the same)

For a project I am forced to run some code with TensorFlow version 1.9.0. I created a virtualenv and installed all the packages needed. The problem is, while I can run code on my GPUs without problems with on my standard environment, it fails to detect GPUs if I switch to the virtualenv.

Here are the details

Standard environment:

TensorFlow version:

import tensorflow as tf
tf.__version__

'2.6.2'

Available devices:

from tensorflow.python.client import device_lib 
print(device_lib.list_local_devices())

[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 11602830920464884706 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 40294875136 locality {   bus_id: 1   links {
        link {
          device_id: 1
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 2
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 3
          type: "StreamExecutor"
          strength: 1
        }   } } incarnation: 15663242159133845284 physical_device_desc: "device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:18:00.0, compute capability: 8.0" , name: "/device:GPU:1" device_type: "GPU" memory_limit: 40294875136 locality {   bus_id: 1   links {
        link {
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 2
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 3
          type: "StreamExecutor"
          strength: 1
        }   } } incarnation: 2349170340922218139 physical_device_desc: "device: 1, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:3b:00.0, compute capability: 8.0" , name: "/device:GPU:2" device_type: "GPU" memory_limit: 40294875136 locality {   bus_id: 2   numa_node: 1   links {
        link {
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 1
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 3
          type: "StreamExecutor"
          strength: 1
        }   } } incarnation: 14414713532534108564 physical_device_desc: "device: 2, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0" , name: "/device:GPU:3" device_type: "GPU" memory_limit: 40294875136 locality {   bus_id: 2   numa_node: 1   links {
        link {
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 1
          type: "StreamExecutor"
          strength: 1
        }
        link {
          device_id: 2
          type: "StreamExecutor"
          strength: 1
        }   } } incarnation: 9097569354751962058 physical_device_desc: "device: 3, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:af:00.0, compute capability: 8.0" ]

CUDA status:

tf.test.is_gpu_available(
    cuda_only=False, min_cuda_compute_capability=None)

WARNING:tensorflow:From <ipython-input-5-97ecbf874269>:2: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead.

True

Virtualenv:

TensorFlow version:

import tensorflow as tf
tf.__version__

'1.9.0'

Available devices:

from tensorflow.python.client import device_lib 
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7502529416575806022
]

CUDA status:

tf.test.is_gpu_available(
    cuda_only=False, min_cuda_compute_capability=None)

False

This is the output of nvidia-smi (exactly the same for both environments):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:18:00.0 Off |                    0 |
| N/A   31C    P0    35W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   29C    P0    31W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCI...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   52C    P0    39W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCI...  On   | 00000000:AF:00.0 Off |                    0 |
| N/A   30C    P0    32W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

So my question is why are GPUs not available in the virtualenv? Does is have to do with the CUDA (I'm no expert in this topic) or TensorFlow version (I know that version 1 and 2 have a lot of differences)? And how can I fix that?

SilentCloud
  • 1,677
  • 3
  • 9
  • 28
  • Different tensorflow versions require different CUDA versions, look at your tensorflow logs for more information. – Dr. Snoopy Aug 20 '22 at 11:27
  • Hey @Dr.Snoopy thanks for the comment. I have now installed CUDA and cuDNN with correct versions, btu still TF does not sees GPUs. Any idea/suggestion? – SilentCloud Sep 13 '22 at 14:17
  • Is there any warning when you `import tensorflow as tf`? – André Sep 13 '22 at 14:25
  • @André if I `pip install tensorflow==1.9` I have no errors, but GPU is not available. If I uninstall TF and then `pip install tensorflow-gpu=1.9` I have errors, for example `tensorflow' has no attribute '__version__`. I don't know if I am supposed to import `tensorflow-gpu somehow` – SilentCloud Sep 13 '22 at 14:36
  • 1
    AFAIK `import tensorflow as tf` should work also for the GPU version, but I didn't use TF1 a lot. You can also try to install tensorflow gpu using `conda install`. In my experience there you have a smaller risk to misconfigure your installs. – André Sep 13 '22 at 15:24

0 Answers0