I was trying to use TensorFlow with GPU and got the following error:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
of course I am trying to fix this error (though this has already been asked Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100)) but I'd like to understand the error. I always try to attempt solving (coding) problems myself before posting (asking for help) but I am having a hard time even starting this one because the error message seems a little cryptic/unclear to me and I can't seem to find a good resource to understand what the error means.
To understand the error I focused on the line that seems to be where the error starts:
Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100).
After reading some github pages that seemed relevant I realized that reading the error as follows is actually more helpful:
Loaded runtime CuDNN library: 5005 but source was compiled with 5103.
removing the parenthesis makes the error make a bit more sense (though I'd like to understand/know what the role of the parenthesis is in the error message to easy the debugging) since it seems that it loaded CuDNN library 5005 (at the level of UNIX/OS) but the TensorFlow (for python) was compiled with what I would guess is version 5103. Obviously if the TensorFlow library is using an API according to 5103 but the "real" API to talk to the (cuda) deep learning library CuDNN is version 5005, its clear it would be a problem. Though they are just guesses of whats going on.
My first confusion is that as far as I can tell, there is no such thing CuDNN 5005 or 5103. It would be awesome to understand what that part of the error means for sure so that I can start trying to debug this for real. As far as I can tell when I use module list
I am using:
cudnn/5.0
My second confusion is the parenthesis that I ignored and what they mean:
Loaded runtime CuDNN library: 5005 (compatibility version 5000)
but source was compiled with 5103 (compatibility version 5100)
I honestly have no idea idea what the "compatibility version XXXX" means. Maybe its suggestion to install version 5000 (whatever that means) for CuDNN (which is still confusing because there isn't a 5 thousand version of CuDNN) and compile a version of TensorFlow (somehow) that uses CuDNN version 5100.
Does someone know more precisely what the errors mean exactly (and make provide their solution to the question I linked?)