3

I am trying to train my CNN model using Keras with Tensorflow backend.

The problem is, when I run the fit_generator() function, the Python kernel in Spyder crashes. To add a little bit of context, I just installed CUDA and tensorflow-gpu as described in this link so that I could use my GPU. Prior to this, everything was just fine.

Here is the full log

An error ocurred while starting the kernel

2018 20:44:44.791399: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018 20:44:45.084153: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2018 20:44:45.086132: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0
2018 20:44:45.906189: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/device:GPU:0 with 3033 MB memory) ‑> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018 20:47:25.845646: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0
2018 20:47:25.846108: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 74 MB memory) ‑> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018 20:47:26.499846: E C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018 20:47:26.500247: E C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:389] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2018 20:47:26.500717: F C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream‑>parent()‑>GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms) 
2018 20:52:22.359428: E C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018 20:52:22.359982: E C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:389] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2018 20:52:22.360678: F C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream‑>parent()‑>GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms) 

Has anyone ever faced this problem before?

Thanks in advance

GPhilo
  • 18,519
  • 9
  • 63
  • 89
user2552108
  • 1,107
  • 3
  • 15
  • 30

3 Answers3

3

I just had the same problem and found a solution in a Github issue. You need to update your GPU drivers after installing CUDA 9.0, as it seems the installer is reverting you to old drivers. They also suggest rebooting, but in my case that was not necessary.

GPhilo
  • 18,519
  • 9
  • 63
  • 89
0

I faced the same error installing TF 1.7 on Linux with Python 3.6 via

conda install tensorflow-gpu==1.7 -c free -y.

For some reason, during one of the runs I got not only the error message

F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

but also

E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7103 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

right before it. So, based on the answer here, I ran

conda install cudnn==7.0.5

to change the version of cudnn installed automatically along with tensorflow-gpu==1.7, and now it works. However, it still takes a lot of time before the training starts.

49109
  • 3
  • 3
-1

TL;DR) Upgrade your CUDA/CUDNN installation.

CUDA/CUDNN version requirements for TensorFlow may change as new releases ships, and it is very likely that your CUDA/CUDNN version is not compatible. This can happen because of either CUDA and CUDNN version.

Here is the table of TF version and CUDA/CUDNN version compatibility: https://www.tensorflow.org/install/source#gpu

Jongwook Choi
  • 8,171
  • 3
  • 25
  • 22