"Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED" on a project that should work out of the box

Question

https://github.com/zzh8829/yolov3-tf2 is the project. I've installed all the correct versions ofthings I think.

google is telling me that it is probably a low VRAM issue but I am still looking around for other reasons. please help. I am using :

Windows 10 (don't say "there's your problem" I need it)

cuDNN 7.4.6

CUDA 10.0

tensorflow 2.0.0

python 3.6

I have a gtx1660 super 6GB VRAM with a ryzen 7 2700x on 16GB of RAM. I'm getting a gt1080 8gig in a few days I'm going to add to the second PCI slot.

the Error is as follows:

2019-11-30 06:31:26.167368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll                                
2019-11-30 06:31:27.843742: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED                                      
2019-11-30 06:31:27.853725: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED                                      
Traceback (most recent call last):                                                                                                                                           
  File ".\convert.py", line 34, in <module>                                                                                                                                  
    app.run(main)                                                                                                                                                            
  File "C:\Program Files\Python36\lib\site-packages\absl\app.py", line 299, in run                                                                                           
    _run_main(main, args)                                                                                                                                                    
  File "C:\Program Files\Python36\lib\site-packages\absl\app.py", line 250, in _run_main                                                                                     
    sys.exit(main(argv))                                                                                                                                                     
  File ".\convert.py", line 25, in main                                                                                                                                      
    output = yolo(img)                                                                                                                                                       
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__                                                
    outputs = self.call(cast_inputs, *args, **kwargs)                                                                                                                        
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call                                                       
    convert_kwargs_to_constants=base_layer_utils.call_context().saving)                                                                                                      
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph                                        
    output_tensors = layer(computed_tensors, **kwargs)                                                                                                                       
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__                                                
    outputs = self.call(cast_inputs, *args, **kwargs)                                                                                                                        
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call                                                       
    convert_kwargs_to_constants=base_layer_utils.call_context().saving)                                                                                                      
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph                                        
    output_tensors = layer(computed_tensors, **kwargs)                                                                                                                       
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__                                                
    outputs = self.call(cast_inputs, *args, **kwargs)                                                                                                                        
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 197, in call                                                 
    outputs = self._convolution_op(inputs, self.kernel)                                                                                                                      
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1134, in __call__                                                            
    return self.conv_op(inp, filter)                                                                                                                                         
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 639, in __call__                                                             
    return self.call(inp, filter)                                                                                                                                            
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 238, in __call__                                                             
    name=self.name)                                                                                                                                                          
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d                                                              
    name=name)                                                                                                                                                               
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1031, in conv2d                                                          
    data_format=data_format, dilations=dilations, name=name, ctx=_ctx)                                                                                                       
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1130, in conv2d_eager_fallback                                           
    ctx=_ctx, name=name)                                                                                                                                                     
  File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute                                                      
    six.raise_from(core._status_to_exception(e.code, message), None)                                                                                                         
  File "<string>", line 3, in raise_from                                                                                                                                     
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a wa
rning log message was printed above. [Op:Conv2D]

sebastian-sz · Answer 1 · 2019-12-01T21:14:42.783

1

I had the same problem in the same repository.

The solution that worked for me and my team was to upgrade cuDNN to version 7.5 or higher (as opposed to your 7.4).

The instructions for updating can be found on Nvidia's site:
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

edited Dec 01 '19 at 21:14

answered Dec 01 '19 at 21:08

sebastian-sz

1,418
8
13

score 1 · Answer 2 · answered Dec 02 '19 at 19:36

This could happen for a few reasons.

(1) As you mentioned, it may be a a memory issue, which you could try to verify by allocating less memory to the GPU and seeing if that error still occurs. You can do this in TF 2.0 like so (https://github.com/tensorflow/tensorflow/issues/25138#issuecomment-484428798):

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.75)
tf.config.gpu.set_per_process_memory_growth(True)

# your model creation, etc.
model = MyModel(...)

I see the code you're running sets dynamic memory growth if you have > 1 GPU (https://github.com/zzh8829/yolov3-tf2/blob/master/train.py#L46-L47), but since you only have 1 GPU, then it is likely just trying to allocate all memory (>90%) at the start.

(2) Some users seem to have experienced this on Windows when there were other TensorFlow or similar processes using the GPU simultaneously, either by you or by other users: https://stackoverflow.com/a/53707323/10993413

(3) As always, make sure your PATH variables are correct. Sometimes if you tried multiple installations and didn't clean things up properly, the PATHs may be finding the wrong version first and cause an issue. If you add new paths to the beginning of PATH, they should be found first: https://www.tensorflow.org/install/gpu#windows_setup

(4) As mentioned by @xenotecc, you could try upgrading to a newer version of CUDNN, though I'm not sure this will help since your config is listed as supported on TF docs: https://www.tensorflow.org/install/source#gpu. If this does solve it, it may have been PATH issue after all since you will likely update the PATHs after installing the newer version.

unfortunately... gpu.config was taken out and moved to experimental but I have found another way and I'm also going to convert the code to pytorch... I've been using this code as a base for my learnings and pretty much rewrote all of it now. Thank you. — moop, Dec 05 '19 at 18:27

Leon · Answer 3 · 2020-02-05T05:47:39.077

0

Got the same error and resolved by below:

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
          gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5000)])

(with GTX 1660, 6G memory, tensorflow 2.0.1)

edited Feb 05 '20 at 05:47

answered Feb 05 '20 at 05:40

Leon

3,124
31
36

score 0 · Answer 4 · answered Feb 19 '20 at 15:24

0

Simple fix: insert this line under the imports in "convert.py"

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

this will ignore your gpu while loading the weights.

answered Feb 19 '20 at 15:24

LC117

426
4
12

"Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED" on a project that should work out of the box

4 Answers4