(Similar to this thread but not quite the same)
For a project I am forced to run some code with TensorFlow version 1.9.0. I created a virtualenv and installed all the packages needed. The problem is, while I can run code on my GPUs without problems with on my standard environment, it fails to detect GPUs if I switch to the virtualenv.
Here are the details
Standard environment:
TensorFlow version:
import tensorflow as tf
tf.__version__
'2.6.2'
Available devices:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 11602830920464884706 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 40294875136 locality { bus_id: 1 links {
link {
device_id: 1
type: "StreamExecutor"
strength: 1
}
link {
device_id: 2
type: "StreamExecutor"
strength: 1
}
link {
device_id: 3
type: "StreamExecutor"
strength: 1
} } } incarnation: 15663242159133845284 physical_device_desc: "device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:18:00.0, compute capability: 8.0" , name: "/device:GPU:1" device_type: "GPU" memory_limit: 40294875136 locality { bus_id: 1 links {
link {
type: "StreamExecutor"
strength: 1
}
link {
device_id: 2
type: "StreamExecutor"
strength: 1
}
link {
device_id: 3
type: "StreamExecutor"
strength: 1
} } } incarnation: 2349170340922218139 physical_device_desc: "device: 1, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:3b:00.0, compute capability: 8.0" , name: "/device:GPU:2" device_type: "GPU" memory_limit: 40294875136 locality { bus_id: 2 numa_node: 1 links {
link {
type: "StreamExecutor"
strength: 1
}
link {
device_id: 1
type: "StreamExecutor"
strength: 1
}
link {
device_id: 3
type: "StreamExecutor"
strength: 1
} } } incarnation: 14414713532534108564 physical_device_desc: "device: 2, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0" , name: "/device:GPU:3" device_type: "GPU" memory_limit: 40294875136 locality { bus_id: 2 numa_node: 1 links {
link {
type: "StreamExecutor"
strength: 1
}
link {
device_id: 1
type: "StreamExecutor"
strength: 1
}
link {
device_id: 2
type: "StreamExecutor"
strength: 1
} } } incarnation: 9097569354751962058 physical_device_desc: "device: 3, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:af:00.0, compute capability: 8.0" ]
CUDA status:
tf.test.is_gpu_available(
cuda_only=False, min_cuda_compute_capability=None)
WARNING:tensorflow:From <ipython-input-5-97ecbf874269>:2: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.config.list_physical_devices('GPU')` instead.
True
Virtualenv:
TensorFlow version:
import tensorflow as tf
tf.__version__
'1.9.0'
Available devices:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7502529416575806022
]
CUDA status:
tf.test.is_gpu_available(
cuda_only=False, min_cuda_compute_capability=None)
False
This is the output of nvidia-smi
(exactly the same for both environments):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... On | 00000000:18:00.0 Off | 0 |
| N/A 31C P0 35W / 250W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCI... On | 00000000:3B:00.0 Off | 0 |
| N/A 29C P0 31W / 250W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-PCI... On | 00000000:86:00.0 Off | 0 |
| N/A 52C P0 39W / 250W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-PCI... On | 00000000:AF:00.0 Off | 0 |
| N/A 30C P0 32W / 250W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
So my question is why are GPUs not available in the virtualenv? Does is have to do with the CUDA (I'm no expert in this topic) or TensorFlow version (I know that version 1 and 2 have a lot of differences)? And how can I fix that?