CuDNNLSTM: UnknownError: Fail to find the dnn implementation

Question

I have run the model with LSTM as the first layer successfully. But out of curiosity, I replace LSTM with CuDNNLSTM. But after model.fit, it replied the following error message:

UnknownError: Fail to find the dnn implementation.
    [[{{node cu_dnnlstm_5/CudnnRNN}} = CudnnRNN[T=DT_FLOAT, _class=["loc:@training_2/Adam/gradients/cu_dnnlstm_5/CudnnRNN_grad/CudnnRNNBackprop"], direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=87654321, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnnlstm_5/transpose, cu_dnnlstm_5/ExpandDims_1, cu_dnnlstm_5/ExpandDims_1, cu_dnnlstm_5/concat_1)]]
    [[{{node metrics_3/mean_squared_error/Mean_1/_1877}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4852_metrics_3/mean_squared_error/Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I have tried TestCudnnLSTM() on this discussion and pass the test successfully:

Keras version: 2.2.4
Tensorflow version: 1.12.0
Creating Model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
cu_dnnlstm_1 (CuDNNLSTM)     (None, 1000, 1)           16        
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
None
Model compiled

It seems that the problem appears during model fitting. But I don't know exactly what is the problem?

I regularly have this problem as well with tf 1.13 and CuDNN 7.5. However it randomly happens only about 10% of the times. Usually I can just start the program again and it works fine. — jlh, May 16 '19 at 05:21

score 40 · Answer 1 · answered Jan 12 '20 at 19:16

40

For TensorFlow v2, one solution would be -

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

Then you can use keras model too -

from tensorflow.keras.models import Model

Documentation

This solution worked for me, it enables memory growth for only one GPU.

answered Jan 12 '20 at 19:16

Sadidul Islam

1,088
12
13

2

thanks but for me it is `physical_devices = tf.config.experimental.list_physical_devices('GPU')` – TX Shi Feb 16 '20 at 04:06
Thanks, it works even on my Windows machine now! Do you know why is this happening? I would not guess that `UnknownError: Fail to find the dnn implementation.` is somehow related to memory growth. And by default (not allowed memory growth) I thought tf will allocate as much memory as possible. So why is it not working by default? – Nerxis Apr 20 '20 at 10:24
1

If you see the documentation you will find they say - "If memory growth is enabled for a PhysicalDevice, the runtime initialization will not allocate all memory on the device. Memory growth cannot be configured on a PhysicalDevice with virtual devices configured." That means by default TensorFlow tries to allocate all memory for the model at once, and due to the lack of memory, it shows the error. Still, it is based on that documentation. But I have found the error even for a small model. – Sadidul Islam Jun 12 '20 at 05:00
wow, I wasnt expecting to solve, thanks a lot – smoothumut Apr 15 '21 at 07:16

score 6 · Answer 2 · answered Aug 05 '19 at 11:54

6

If you're getting this error while fitting Keras NN put this code on your import

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)

credit

answered Aug 05 '19 at 11:54

Federricco

129
1
7

4

ModuleNotFoundError: No module named 'keras.backend.tensorflow_backend'; 'keras.backend' is not a package. I guess this answer was valid for older versions of keras/tf – Tobbey Jul 21 '21 at 08:08

score 1 · Answer 3 · answered Feb 11 '19 at 11:32

Make sure you have the proper Nvidia driver version for the version of CUDA you are using. You can check it out here. https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility

I'm using CUDA 9.0, but was using Nvidia driver less than 384.81. Updating the Nvidia driver to a newer one fixed the problem for me.

score 1 · Answer 4 · answered Feb 28 '19 at 09:35

I had the same issue , when I updated tensorflow to 1.12. Error got resolved after updating my CuDNN verstion to 7.5 from 7. I followed the steps mentioned in the below url for updating the CuDNN version (Note: The steps mentioned in the link are for installing CUDNN , but the same is applicable for update as well)

https://jhui.github.io/2017/09/07/AWS-P2-CUDA-CuDNN-TensorFlow/

score 1 · Answer 5 · answered Nov 18 '19 at 14:33

In tensorflow 2.0 i got the same error while running RNN LSTM model.The reason was due to lower version of my cuDNN.In the tensorflow gpu requirements page it was recommended to have

cuDNN SDK >= 7.4.1.

You can refer for more details in https://www.tensorflow.org/install/gpu

Asked in Tensorflow Reddit forum

https://www.reddit.com/r/tensorflow/comments/dxnnq2/i_am_getting_an_error_while_running_the_rnn_lstm/?utm_source=share&utm_medium=web2x

score 1 · Answer 6 · edited Jun 03 '21 at 16:48

1

I would recommend checking if any other kernel has imported tensorflow or keras. If yes, shutdown that kernel - even if it is not busy. It solved the problem in my case.

edited Jun 03 '21 at 16:48

knb

9,138
4
58
85

answered Jun 03 '21 at 05:47

SWARALIPI BOSE

41
1
3

score 0 · Answer 7 · answered Apr 22 '20 at 19:26

0

I Installed tensorflow and keras using conda in the Virtual env and this solved it.

conda install tensorflow
conda install keras

answered Apr 22 '20 at 19:26

MStanley

15
3

score 0 · Answer 8 · answered Aug 07 '21 at 13:36

Also check that the cuDNN is present for the CUDA version your application uses.

Upgrading tensorflow can cause it using another CUDA version

For instance tensorflow-2.3 uses CUDA 10.1 but tensorflow-2.5 uses 11.2

I got the same error in Windows and I had to copy the latest cuDNN DLL's into the "c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2" folder

score 0 · Answer 9 · edited Dec 30 '22 at 07:54

0

My code worked after I check all the versions of the following packages: cuda, cudnn, tensorflow and gcc. You need to find the corresponding version for all, hope it helps!

Mine version is below:

Cuda 11.1
Gcc-9
Cudnn-8.2
Tensorflow-2.6
keras-2.6
python-3.6

edited Dec 30 '22 at 07:54

Community

1
1

answered Dec 13 '22 at 15:03

Anthoney S

1
2

Instead of simply providing the answer directly, try writing a detailed comment that explains the solution, as long as the explanation is not too lengthy. @Anthoney S. – DSDmark Dec 20 '22 at 05:07
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 20 '22 at 05:07

CuDNNLSTM: UnknownError: Fail to find the dnn implementation

9 Answers9

Linked

Related