I installed tensorflow via Anaconda. It worked fine and recognized the GPU for some time already. But suddenly since some days ago no environment with tensorflow recognizes my GPU anymore. Does anyone has an idea what to check?
What I've tried:
- Created a fresh environment with python=3.7 and installed tensorflow-gpu=2.1
- Reinstalled anaconda
- Created a fresh environment with python=3.6 and installed tensorflow-gpu=1.9
- Installed tensorflow-gpu=2.3 and installed missing cudatoolkit=10.1 and cudnn=7.6
- Installed tensorflow-gpu with specific build number according an open github issue
- I set the environment variable
CUDA_VISIBLE_DEVICES
to 0 via python (TensorFlow : failed call to cuInit: CUDA_ERROR_NO_DEVICE) - I updated my graphics driver
- Removed modified registry entry
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrDelay
My test script to check for recognized devices:
import tensorflow as tf
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
This is the output I got with every configuration:
> python check.py
2021-03-10 18:48:12.880629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-03-10 18:48:14.637784: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2021-03-10 18:48:19.201572: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2021-03-10 18:48:19.705910: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-03-10 18:48:19.715756: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: NB-170
2021-03-10 18:48:19.721085: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: NB-170
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 10539449374211484676
]
System information
- OS: Windows 10 Pro (Version 10.0.18363 Build 18363)
- Graphics card: NVIDIA GeForce GTX 1650
- Anaconda 1.10
- Changed registry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrDelay
to 15 to train Matterport's mask r-cnn implementation - Graphics Driver - GEFORCE GAME READY DRIVER - Version: 461.72 WHQL; Release Date: 2021.2.25; Operating System: Windows 10 64-bit; Language: English
My nvdia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.72 Driver Version: 461.72 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 54C P8 6W / N/A | 132MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Update 1 (2021-03-14)
I installed a fresh Anaconda install and create an environment (conda create -name tf-gpu tensorflow-gpu=2.1
on another computer I have. On that machine my gpu is recognized without any problems.
2021-03-14 14:21:33.934222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-03-14 14:21:37.608844: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2021-03-14 14:21:37.612173: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2021-03-14 14:21:37.658982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 970 computeCapability: 5.2
coreClock: 1.253GHz coreCount: 13 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 208.91GiB/s
2021-03-14 14:21:37.659525: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-03-14 14:21:38.216002: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2021-03-14 14:21:38.625300: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2021-03-14 14:21:38.660856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2021-03-14 14:21:38.971988: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2021-03-14 14:21:39.247585: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2021-03-14 14:21:39.564512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2021-03-14 14:21:39.565268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-03-14 14:21:41.272007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-14 14:21:41.272272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2021-03-14 14:21:41.272582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2021-03-14 14:21:41.283835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 2993 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0, compute capability: 5.2)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17009642916451828901
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3139148187
locality {
bus_id: 1
links {
}
}
incarnation: 5677250807137925801
physical_device_desc: "device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0, compute capability: 5.2"
]