18

I have run the model with LSTM as the first layer successfully. But out of curiosity, I replace LSTM with CuDNNLSTM. But after model.fit, it replied the following error message:

UnknownError: Fail to find the dnn implementation.
    [[{{node cu_dnnlstm_5/CudnnRNN}} = CudnnRNN[T=DT_FLOAT, _class=["loc:@training_2/Adam/gradients/cu_dnnlstm_5/CudnnRNN_grad/CudnnRNNBackprop"], direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=87654321, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](cu_dnnlstm_5/transpose, cu_dnnlstm_5/ExpandDims_1, cu_dnnlstm_5/ExpandDims_1, cu_dnnlstm_5/concat_1)]]
    [[{{node metrics_3/mean_squared_error/Mean_1/_1877}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4852_metrics_3/mean_squared_error/Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I have tried TestCudnnLSTM() on this discussion and pass the test successfully:

Keras version: 2.2.4
Tensorflow version: 1.12.0
Creating Model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
cu_dnnlstm_1 (CuDNNLSTM)     (None, 1000, 1)           16        
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
None
Model compiled

It seems that the problem appears during model fitting. But I don't know exactly what is the problem?

Fay Wang
  • 191
  • 1
  • 1
  • 4
  • I regularly have this problem as well with tf 1.13 and CuDNN 7.5. However it randomly happens only about 10% of the times. Usually I can just start the program again and it works fine. – jlh May 16 '19 at 05:21

9 Answers9

40

For TensorFlow v2, one solution would be -

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

Then you can use keras model too -

from tensorflow.keras.models import Model

Documentation

This solution worked for me, it enables memory growth for only one GPU.

Sadidul Islam
  • 1,088
  • 12
  • 13
  • 2
    thanks but for me it is `physical_devices = tf.config.experimental.list_physical_devices('GPU')` – TX Shi Feb 16 '20 at 04:06
  • Thanks, it works even on my Windows machine now! Do you know why is this happening? I would not guess that `UnknownError: Fail to find the dnn implementation.` is somehow related to memory growth. And by default (not allowed memory growth) I thought tf will allocate as much memory as possible. So why is it not working by default? – Nerxis Apr 20 '20 at 10:24
  • 1
    If you see the documentation you will find they say - "If memory growth is enabled for a PhysicalDevice, the runtime initialization will not allocate all memory on the device. Memory growth cannot be configured on a PhysicalDevice with virtual devices configured." That means by default TensorFlow tries to allocate all memory for the model at once, and due to the lack of memory, it shows the error. Still, it is based on that documentation. But I have found the error even for a small model. – Sadidul Islam Jun 12 '20 at 05:00
  • wow, I wasnt expecting to solve, thanks a lot – smoothumut Apr 15 '21 at 07:16
6

If you're getting this error while fitting Keras NN put this code on your import

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)

credit

Federricco
  • 129
  • 1
  • 7
  • 4
    ModuleNotFoundError: No module named 'keras.backend.tensorflow_backend'; 'keras.backend' is not a package. I guess this answer was valid for older versions of keras/tf – Tobbey Jul 21 '21 at 08:08
1

Make sure you have the proper Nvidia driver version for the version of CUDA you are using. You can check it out here. https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility

I'm using CUDA 9.0, but was using Nvidia driver less than 384.81. Updating the Nvidia driver to a newer one fixed the problem for me.

Nissan
  • 466
  • 1
  • 4
  • 12
1

I had the same issue , when I updated tensorflow to 1.12. Error got resolved after updating my CuDNN verstion to 7.5 from 7. I followed the steps mentioned in the below url for updating the CuDNN version (Note: The steps mentioned in the link are for installing CUDNN , but the same is applicable for update as well)

https://jhui.github.io/2017/09/07/AWS-P2-CUDA-CuDNN-TensorFlow/

1

In tensorflow 2.0 i got the same error while running RNN LSTM model.The reason was due to lower version of my cuDNN.In the tensorflow gpu requirements page it was recommended to have

cuDNN SDK >= 7.4.1.

You can refer for more details in https://www.tensorflow.org/install/gpu

Asked in Tensorflow Reddit forum

https://www.reddit.com/r/tensorflow/comments/dxnnq2/i_am_getting_an_error_while_running_the_rnn_lstm/?utm_source=share&utm_medium=web2x

FrozenWolf
  • 86
  • 8
1

I would recommend checking if any other kernel has imported tensorflow or keras. If yes, shutdown that kernel - even if it is not busy. It solved the problem in my case.

knb
  • 9,138
  • 4
  • 58
  • 85
0

I Installed tensorflow and keras using conda in the Virtual env and this solved it.

conda install tensorflow
conda install keras
MStanley
  • 15
  • 3
0

Also check that the cuDNN is present for the CUDA version your application uses.

Upgrading tensorflow can cause it using another CUDA version

For instance tensorflow-2.3 uses CUDA 10.1 but tensorflow-2.5 uses 11.2

I got the same error in Windows and I had to copy the latest cuDNN DLL's into the "c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2" folder

Andrei Pop
  • 81
  • 1
  • 4
0

My code worked after I check all the versions of the following packages: cuda, cudnn, tensorflow and gcc. You need to find the corresponding version for all, hope it helps!

Mine version is below:

  • Cuda 11.1
  • Gcc-9
  • Cudnn-8.2
  • Tensorflow-2.6
  • keras-2.6
  • python-3.6
Community
  • 1
  • 1
  • Instead of simply providing the answer directly, try writing a detailed comment that explains the solution, as long as the explanation is not too lengthy. @Anthoney S. – DSDmark Dec 20 '22 at 05:07
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 20 '22 at 05:07