Background:
I have been using TensorFlow + Keras for deep learning since 3-4 months on my laptop. I usually use python scripts and invoke them from shell. Today I decided to try using jupyter notebook for trying out a deep neural network from my script. I was starting an epoch using Keras, but then I was met with the following error
Error:
Using TensorFlow backend.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From D:\Anaconda3\envs\tensorflow-gpu-resonator\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
2019-03-16 15:19:11.414481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-03-16 15:19:11.971669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.35GiB
2019-03-16 15:19:11.981427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-16 15:19:26.748014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-16 15:19:26.752914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-03-16 15:19:26.757983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-03-16 15:19:26.792177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3058 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2019-03-16 15:19:29.466453: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-03-16 15:19:54.230708: E tensorflow/stream_executor/cuda/cuda_driver.cc:981] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.236093: E tensorflow/stream_executor/cuda/cuda_timer.cc:55] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.243638: E tensorflow/stream_executor/cuda/cuda_timer.cc:60] Internal: error destroying CUDA event in context 0000016C5B168AD0: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-03-16 15:19:54.250401: F tensorflow/stream_executor/cuda/cuda_dnn.cc:194] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.
Solutions Tried:
As suggested by multiple similar issues on GitHub, I tried updating my CUDA version and TensorFlow version and my graphics drivers Currently I am using:
CUDA Version : 10.0.130
cuDNN Version : 7.3.1
TensorFlow GPU : 1.13.1
Keras GPU : 2.2.4
OS : Windows 10 64 bit Version 1809
GPU : NVIDIA GeForce 940MX
GPU Driver Version : 419.35 released 03-05-2019
I thought this must be because TensorFlow is simply not able to connect to GPU properly. So I tried TensorFlow GPU test from here and it seems to be working.(Check out image here since I don't have enough reputation)
Judging from #1, #2, #3 this seems to be an error due to unauthorized memory access.
How should I solve this problem ? I thought since this was related to memory management, a reboot could solve this. But I still get this issue.
[P.S. : All the software libraries like cuDNN and CUDA are installed through Anaconda3 with Python 3.6.6]