TensorFlow: Why does cuDNN fails to launch ? CUDA_ERROR_LAUNCH_FAILED

Question

Background:

I have been using TensorFlow + Keras for deep learning since 3-4 months on my laptop. I usually use python scripts and invoke them from shell. Today I decided to try using jupyter notebook for trying out a deep neural network from my script. I was starting an epoch using Keras, but then I was met with the following error

Error:

Using TensorFlow backend.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
2019-03-16 15:19:11.414481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-03-16 15:19:11.971669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.35GiB
2019-03-16 15:19:11.981427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-16 15:19:26.748014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-16 15:19:26.752914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-03-16 15:19:26.757983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-03-16 15:19:26.792177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3058 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-03-16 15:19:29.466453: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-03-16 15:19:54.230708: E tensorflow/stream_executor/cuda/cuda_driver.cc:981] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.236093: E tensorflow/stream_executor/cuda/cuda_timer.cc:55] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.243638: E tensorflow/stream_executor/cuda/cuda_timer.cc:60] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.250401: F tensorflow/stream_executor/cuda/cuda_dnn.cc:194] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.

Solutions Tried:

As suggested by multiple similar issues on GitHub, I tried updating my CUDA version and TensorFlow version and my graphics drivers Currently I am using:

CUDA Version : 10.0.130
cuDNN Version : 7.3.1
TensorFlow GPU : 1.13.1
Keras GPU : 2.2.4
OS : Windows 10 64 bit Version 1809
GPU : NVIDIA GeForce 940MX
GPU Driver Version : 419.35 released 03-05-2019

I thought this must be because TensorFlow is simply not able to connect to GPU properly. So I tried TensorFlow GPU test from here and it seems to be working.(Check out image here since I don't have enough reputation)

Judging from #1, #2, #3 this seems to be an error due to unauthorized memory access.

How should I solve this problem ? I thought since this was related to memory management, a reboot could solve this. But I still get this issue.

[P.S. : All the software libraries like cuDNN and CUDA are installed through Anaconda3 with Python 3.6.6]

Try creating a new anconda virtual environment and reinstall the requirements in it and try. — Khaldoun Nd, Mar 16 '19 at 10:29
Also you need to check if the cuda and cudnn versions match. This happened with me once and switching back to older versions worked. — Khaldoun Nd, Mar 16 '19 at 10:30
@KhaldounNd thanks for the suggestion. However, can you give me an intuition to why the previous environment got somehow corrupted ? — ashutoshbsathe, Mar 16 '19 at 10:31
I believe that it has something to do with the jupyter notebook not using the correct versions of the libraries that you were using or somehow one dependency got updated or corrupted. are the script and the old code that you have written before still working? — Khaldoun Nd, Mar 16 '19 at 10:34
I really suggest you try to reinstall the required libraries but use older versions that have been verified to be working. This is the only thing i could think off. — Khaldoun Nd, Mar 16 '19 at 10:43

TensorFlow: Why does cuDNN fails to launch ? CUDA_ERROR_LAUNCH_FAILED

0 Answers0