Tensorflow crash with CUDNN_STATUS_ALLOC_FAILED

Question

Been searching the web for hours with no results, so figured I'd ask here.

I'm trying to make a self driving car following Sentdex's tutorial, but when running the model, I get a bunch of fatal errors. I've searched all over the internet for the solution, and many seem to have the same problem. However, none of the solutions I've found (Including this Stack-post), work for me.

Here is my software:

Tensorflow: 1.5, GPU version
CUDA: 9.0, with the patch
CUDnn: 7
Windows 10 Pro
Python 3.6

Hardware:

Nvidia 1070ti, with latest drivers
Intel i5 7600K

Here is the crash log:

2018-02-04 16:29:33.606903: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:33.608872: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:33.609308: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:35.145249: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED 2018-02-04 16:29:35.145563: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2018-02-04 16:29:35.149896: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

Here's my code:

 import tensorflow as tf
    import numpy as np
    import cv2
    import time
    from PIL import ImageGrab
    from getkeys import key_check
    from alexnet import alexnet
    import os
    from sendKeys import PressKey, ReleaseKey, W,A,S,D,Sp

    import random

    WIDTH = 80
    HEIGHT = 60
    LR = 1e-3
    EPOCHS = 10
    MODEL_NAME = 'DiRT-AI-Driver-{}-{}-{}-epochs.model'.format(LR, 'alexnetv2', EPOCHS)

    def straight():
        PressKey(W)
        ReleaseKey(A)
        ReleaseKey(S)
        ReleaseKey(D)
        ReleaseKey(Sp)
    def left():
        PressKey(A)
        ReleaseKey(W)
        ReleaseKey(S)
        ReleaseKey(D)
        ReleaseKey(Sp)
    def right():
        PressKey(D)
        ReleaseKey(A)
        ReleaseKey(S)
        ReleaseKey(W)
        ReleaseKey(Sp)
    def brake():
        PressKey(S)
        ReleaseKey(A)
        ReleaseKey(W)
        ReleaseKey(D)
        ReleaseKey(Sp)
    def handbrake():
        PressKey(Sp)
        ReleaseKey(A)
        ReleaseKey(S)
        ReleaseKey(D)
        ReleaseKey(W)

    model = alexnet(WIDTH, HEIGHT, LR)
    model.load(MODEL_NAME)


    def main():
        last_time = time.time()
        for i in list(range(4))[::-1]:
            print(i+1)
            time.sleep(1)


    paused = False
    while(True):
            if not paused:
                screen = np.array(ImageGrab.grab(bbox=(0,40,1024,768)))
                screen = cv2.cvtColor(screen,cv2.COLOR_BGR2GRAY)
                screen = cv2.resize(screen,(80,60))
                print('Loop took {} seconds'.format(time.time()-last_time))
                last_time = time.time()
                print('took time')
                prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]
                print('predicted')
                moves = list(np.around(prediction))
                print('got moves')
                print(moves,prediction)

                if moves == [1,0,0,0,0]:
                    straight()
                elif moves == [0,1,0,0,0]:
                    left()
                elif moves == [0,0,1,0,0]:
                    brake()
                elif moves == [0,0,0,1,0]:
                    right()
                elif moves == [0,0,0,0,1]:
                    handbrake()

            keys = key_check()

            if 'T' in keys:
                if paused:
                    pased = False
                    time.sleep(1)
                else:
                    paused = True
                    ReleaseKey(W)
                    ReleaseKey(A)
                    ReleaseKey(S)
                    ReleaseKey(D)
                    ReleaseKey(Sp)
                    time.sleep(1)


main()

I've found that the line that crashes python and spawns the first three bugs is this line:

prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]

When running the code, the CPU goes up to a whopping 100%, suggesting that something is seriously off. GPU goes to about 40-50%

I've tried Tensorflow 1.2 and 1.3, as well as CUDA 8, to no good. When installing CUDA I do not install the specific drivers, since they are too old for my GPU. Tried different CUDnn's too, did no good.

"*When running the code, the CPU goes up to a whopping 100%, suggesting that something is seriously off*" – why so? High CPU loads are fine even when you use a GPU. — Eli Korvigo, Feb 04 '18 at 16:36
Only times I've seen spikes from idle to 100% CPU has been on infinite loops, but if you say it's normal in this case, it should be just fine, and shouldn't be a part of the problem. — Gnoske, Feb 04 '18 at 17:53

score 13 · Answer 1 · edited Jun 20 '20 at 09:12

13

In my case, the issue happened because another python console with tensorflow imported was running. Closing it solved the problem.

I have Windows 10, the main errors were :

failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 10 '18 at 14:03

Axel Puig

1,304
9
19

starriet · Answer 2 · 2020-02-19T09:39:05.523

Probably you're running out of GPU memory.

If you're using TensorFlow 1.x:

1st option) set allow_growth to true.

import tensorflow as tf    
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)

2nd option) set memory fraction.

# change the memory fraction as you want

import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

If you're using TensorFlow 2.x:

1st option) set set_memory_growth to true.

# Currently the ‘memory growth’ option should be the same for all GPUs.
# You should set the ‘memory growth’ option before initializing GPUs.

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
  except RuntimeError as e:
    print(e)

2nd option) set memory_limit as you want. Just change the index of gpus and memory_limit in this code below.

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
  except RuntimeError as e:
    print(e)

Since there's multiple options here, I just wanted to specify. TF 2.4, the set_memory_growth option worked for me. — tonyd24601, Mar 04 '21 at 17:10

score 2 · Answer 3 · answered Dec 08 '20 at 17:26

2

Try to set:

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' solved my problem

my environment:

Cudnn 7.6.5

Tensorflow 2.4

Cuda Toolkit 10.1

RTX 2060

answered Dec 08 '20 at 17:26

Proxytype

712
7
18

score 1 · Answer 4 · answered Feb 05 '18 at 08:54

1

Try to add the cuda path to environment variable. It's seems that the problem it's with cuda.

Set the CUDA Path in ~/.bashrc (edit with nano):

#Cuda Nvidia path
$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
$ export CUDA_HOME=/usr/local/cuda

answered Feb 05 '18 at 08:54

David Jimenez

71
6

I removed everything CUDA-related, went into the %PATH% and cleared all CUDA-variables. After reinstalling everything, it now finally works! The problem was the amount of paths I had, from previous attempts. They probably clashed with each other. – Gnoske Feb 05 '18 at 14:47
Well, seems I was too quick there! Now it works maybe 20% of the time. The other runs, I get the same crash. Better, but still not working as intended! – Gnoske Feb 05 '18 at 15:21
Sorry, but I forget to say that after edit .bashrc you may to do `$ source ~/.bashrc `. Make sure that you have only onde declaration of environment variable. – David Jimenez Feb 05 '18 at 16:24
1

I have the same problem on Win10, so how can I add new environment variables on Win10? – ShuangSong Feb 06 '18 at 15:56
There is no "lib64" folder in CUDA's root path on Win10. – ShuangSong Feb 06 '18 at 16:05
I too am on Windows 10 but did the same steps on Windows. To change/add env variables, simply search for "enviroment variables" and it'll pop up. However, this did not solve my problem. Removed all duplicates, problem still remains. – Gnoske Feb 06 '18 at 16:33
For windows you need to add at Path this environment variable: `C:\dev\cuda` `C:\dev\cuda\bin` http://www.netinstructions.com/how-to-install-and-run-gpu-enabled-tensorflow-on-windows/ – David Jimenez Feb 08 '18 at 10:41

score 1 · Answer 5 · answered Apr 24 '19 at 06:01

I encountered the same problem, then I found out that because I'm also using GPU for run other stuffs even it doesn't show on task manager (windows) using GPU. Maybe even things like (rendering videos, video encoding or play heavy workload game, coin mining...). If you think it's still using heavy GPU, just close it off and problem solve.

score 1 · Answer 6 · answered Oct 27 '20 at 00:43

1

I had an almost identical problem. Fixed it by reinstalling tensorflow-gpu.

conda uninstall tensorflow-gpu
conda install tensorflow-gpu

I think pip should work as well.

answered Oct 27 '20 at 00:43

Lucas Yokoy

11
1

Tensorflow crash with CUDNN_STATUS_ALLOC_FAILED

6 Answers6

Linked