3

Recently, I try to repeat the deep learning experiment in Github. However, every time I run that experiment, I will receive the following error information.

2018-08-27 09:32:16.827025: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:332] could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

In this situation, I set the session in Tensorflow as the following.

sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))

If I try to limit the GPU memory as the following, I find that I do not have enough memory to run my model.

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

The information about my GPU is as the following. I am not sure where the problem is and I have met such problems several times. Thank you for your contribution!

2018-08-27 09:31:45.966248: IT:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-08-27 09:31:46.199314: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1392] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.09GiB
Sean
  • 901
  • 2
  • 11
  • 30
  • Related thread, with solution: https://stackoverflow.com/questions/41117740/tensorflow-crashes-with-cublas-status-alloc-failed – yeeking May 03 '20 at 11:29

2 Answers2

3

sean. According to the documentation. The error status CUDNN_STATUS_ALLOC_FAILED is due problem with the host memory and not the device memory. Check your RAM also.

Pradeep Kumar
  • 143
  • 12
  • I confirm this is correct. It not about the memory of the GPU, but the memory of the computer. Increase your DDR3 or DDR4 RAM to 32GB, 64GB or even 128GB – James Nov 26 '18 at 10:18
  • According to the Performance tab in the Task Manager in Windows 10, I'm only using 6.0 out of 16.0 GB, so I have 10.0 GB free memory, and I still get this error. On my laptop (which also uses Windows 10), I only have 8 GB in total, and I don't get this error when using the same model. So this can't be the entire truth. – HelloGoodbye Aug 17 '19 at 15:23
  • @HelloGoodbye. The error status corresponds to both Host and Device memory. It depends on the function that is called. for example `cudnnCreate` it returns if the Host memory allocation fails. Check which function returns the error code. – Pradeep Kumar Aug 17 '19 at 15:36
  • I restarted the computer and now it works again. I'm wondering if it may have something to do with whether TensorFlow exited gracefully? On Linux, when I press the "stop" button (the red square) in PyCharm, an Exception seems to be raised that the script is then able to catch. On Windows, the script doesn't manage to catch any such exception. Maybe this gives rise a memory leak on the GPU that causes this problem? Is there any way to clean old allocated memory on the GPU before trying to use it in that case? – HelloGoodbye Aug 17 '19 at 17:12
0

In my case, this was due to running 2 TensorFlow processes using the GPU simultaneously (either by you or by other users): https://stackoverflow.com/a/53707323/10993413

Source: https://forums.developer.nvidia.com/t/could-not-create-cudnn-handle-cudnn-status-alloc-failed/108261

diman82
  • 702
  • 8
  • 11