1

I installed tensorflow via Anaconda. It worked fine and recognized the GPU for some time already. But suddenly since some days ago no environment with tensorflow recognizes my GPU anymore. Does anyone has an idea what to check?

What I've tried:

  • Created a fresh environment with python=3.7 and installed tensorflow-gpu=2.1
  • Reinstalled anaconda
  • Created a fresh environment with python=3.6 and installed tensorflow-gpu=1.9
  • Installed tensorflow-gpu=2.3 and installed missing cudatoolkit=10.1 and cudnn=7.6
  • Installed tensorflow-gpu with specific build number according an open github issue
  • I set the environment variable CUDA_VISIBLE_DEVICES to 0 via python (TensorFlow : failed call to cuInit: CUDA_ERROR_NO_DEVICE)
  • I updated my graphics driver
  • Removed modified registry entry HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrDelay

My test script to check for recognized devices:

import tensorflow as tf
from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

This is the output I got with every configuration:

> python check.py
2021-03-10 18:48:12.880629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-03-10 18:48:14.637784: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2021-03-10 18:48:19.201572: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2021-03-10 18:48:19.705910: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-03-10 18:48:19.715756: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: NB-170
2021-03-10 18:48:19.721085: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: NB-170
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 10539449374211484676
]

System information

  • OS: Windows 10 Pro (Version 10.0.18363 Build 18363)
  • Graphics card: NVIDIA GeForce GTX 1650
  • Anaconda 1.10
  • Changed registry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrDelay to 15 to train Matterport's mask r-cnn implementation
  • Graphics Driver - GEFORCE GAME READY DRIVER - Version: 461.72 WHQL; Release Date: 2021.2.25; Operating System: Windows 10 64-bit; Language: English

My nvdia-smi output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.72       Driver Version: 461.72       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650   WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P8     6W /  N/A |    132MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Update 1 (2021-03-14)

I installed a fresh Anaconda install and create an environment (conda create -name tf-gpu tensorflow-gpu=2.1 on another computer I have. On that machine my gpu is recognized without any problems.

2021-03-14 14:21:33.934222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-03-14 14:21:37.608844: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2021-03-14 14:21:37.612173: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2021-03-14 14:21:37.658982: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 970 computeCapability: 5.2
coreClock: 1.253GHz coreCount: 13 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 208.91GiB/s
2021-03-14 14:21:37.659525: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-03-14 14:21:38.216002: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2021-03-14 14:21:38.625300: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2021-03-14 14:21:38.660856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2021-03-14 14:21:38.971988: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2021-03-14 14:21:39.247585: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2021-03-14 14:21:39.564512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2021-03-14 14:21:39.565268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-03-14 14:21:41.272007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-14 14:21:41.272272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2021-03-14 14:21:41.272582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-03-14 14:21:41.283835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 2993 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0, compute capability: 5.2)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17009642916451828901
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3139148187
locality {
  bus_id: 1
  links {
  }
}
incarnation: 5677250807137925801
physical_device_desc: "device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0, compute capability: 5.2"
]
Dominik
  • 241
  • 2
  • 12
  • what version of nvidia driver do you have installed? – Bijay Regmi Mar 10 '21 at 23:00
  • GEFORCE GAME READY DRIVER: Version: 461.72 WHQL; Release Date: 2021.2.25; Operating System: Windows 10 64-bit; Language: English – Dominik Mar 11 '21 at 07:58
  • For Cuda 11.2 you need cuDNN 8.0, you can find the list here https://developer.nvidia.com/rdp/cudnn-archive – Bijay Regmi Mar 11 '21 at 08:33
  • Dont forget to include them in your PATH – Bijay Regmi Mar 11 '21 at 08:34
  • I am installing Cuda and cuDNN via Anaconda. If I read my output correctly, cuda ist loaded fine. So it shouldn't be an error with my path. The Cuda version shown in the nvidia-smi output is not an installed version but the highest version which the graphics driver supports. So my graphics driver supports Cuda up to version 11.2. (See the first comment to this questions: https://stackoverflow.com/q/53422407/4295853) – Dominik Mar 11 '21 at 09:13
  • Looks like CUDA 10.1 is also installed, `Successfully opened dynamic library cudart64_101.dll`. If have multiple CUDA versions. Follow this [guide](https://www.tensorflow.org/install/gpu#windows_setup) to setup GPU on windows system. Thanks! –  Mar 12 '21 at 03:18
  • Does this answer your question? [TensorFlow : failed call to cuInit: CUDA\_ERROR\_NO\_DEVICE](https://stackoverflow.com/questions/48658204/tensorflow-failed-call-to-cuinit-cuda-error-no-device) – Tides Jun 01 '21 at 14:22

1 Answers1

0

In my case I was getting the same error: failed call to cuinit: CUDA_ERROR_NO_DEVICE. However nvidia-smi.exe was detecting the gpu. I had CUDA 9.0 installed in my system (Windows 10). Then I realized that I accidentally had a CUDA 10.0 version of the dll nvcuda.dll in my application path. Removing this dll from my application path solved the problem.

Tides
  • 111
  • 11