Tensorflow crashes with CUBLAS_STATUS_ALLOC_FAILED

Question

I'm running tensorflow-gpu on Windows 10 using a simple MINST neural network program. When it tries to run, it encounters a CUBLAS_STATUS_ALLOC_FAILED error. A google search doesn't turn up anything.

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Snympi · Answer 1 · 2021-03-18T12:11:07.900

49

For TensorFlow 2.2 none of the other answers worked when the CUBLAS_STATUS_ALLOC_FAILED problem was encountered. Found a solution on https://www.tensorflow.org/guide/gpu:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

I ran this code before any further calculations are made and found that the same code that produced CUBLAS error before now worked in same session. The sample code above is a specific example that sets the memory growth across a number of physical GPUs but it also solves the memory expansion problem.

edited Mar 18 '21 at 12:11

answered May 25 '20 at 05:19

Snympi

859
13
18

7

In 2020, this was the only solution I found to work. – Trevor Clarke Oct 12 '20 at 19:48
2

This works on several of my apps. Cuda 11.1, cudnn 8.0.5, gpu compute 8.6 3080. – Gregory Alan Bolcer Nov 25 '20 at 02:48
2

Thank you, one connected question, does gpu need to set 'set_memory_growth' flag everytime, when execution? – Santosh K Feb 22 '21 at 12:44
2

I'm using this code every time I start a script that makes use of TensorFlow GPU. – Snympi Feb 24 '21 at 10:18

score 28 · Accepted Answer · edited Jul 28 '20 at 17:50

28

The location of the "allow_growth" property of the session config seems to be different now. It's explained here: https://www.tensorflow.org/tutorials/using_gpu

So currently you'd have to set it like this:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

edited Jul 28 '20 at 17:50

Cadoiz

1,446
21
31

answered Mar 20 '17 at 01:33

Rafal Zajac

1,613
1
16
13

2

session = tf.Session(config=config, ...) ^ SyntaxError: positional argument follows keyword argument solution doesn't work. – Space Bear Oct 11 '18 at 13:48
3

Not working on tf 2.1: tf.__version__ '2.1.0', module 'tensorflow' has no attribute 'ConfigProto' – yeeking May 03 '20 at 11:12
@yee same for `tensorflow-gpu 2.2.0` – Cadoiz Jul 28 '20 at 10:57
module 'tensorflow' has no attribute 'ConfigProto – Stefan Apr 18 '22 at 19:11

score 21 · Answer 3 · edited Feb 26 '22 at 04:06

21

tensorflow>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

edited Feb 26 '22 at 04:06

Eddie Parker

4,770
3
35
43

answered Jan 02 '20 at 03:59

Welcome_back

1,245
12
18

1

Working with tf 2.1.0, windows 10, 16GB RAM, RTX 2070 Max Q 8GB, but I changed the value to 0.5 – yeeking May 03 '20 at 11:23
Working too on this machine too: https://www.userbenchmark.com/UserRun/30694804 But it gives me `The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.` I think, this should be included in the answer, but the author should decide. – Cadoiz Jul 28 '20 at 11:56
Worked on 2020-Nov on Windows x64 under Python 3.75 with Cuda 10.1 and TensorFlow 2.3 on a RTX 2080 Ti. – Contango Nov 18 '20 at 19:11
Working for me with fraction 0.8. Using TensorFlow 2.4.0, windows 10 on RTX 2060. – Rahat Zaman Jan 12 '21 at 07:32
Worked for me but ending up not using much of the GPU memory (<< 0.8). I had better results with the solution by @Anxifer. TensorFlow 2.4.1. – Kenneth Evans Mar 21 '21 at 19:39

score 10 · Answer 4 · answered Oct 11 '18 at 14:03

10

I found this solution works

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

answered Oct 11 '18 at 14:03

Space Bear

755
1
6
18

Not working on for both (tested) `tensorflow 2.1`and `2.2` and giving this error: `AttributeError: module 'tensorflow' has no attribute 'ConfigProto'` – Cadoiz Jul 28 '20 at 11:04

score 4 · Answer 5 · answered Dec 30 '16 at 00:44

4

On windows, currently tensorflow does not allocate all available memory like it says in the documentation, instead you can work around this error by allowing dynamic memory growth as follows:

tf.Session(config=tf.ConfigProto(allow_growth=True))

answered Dec 30 '16 at 00:44

Axiverse

1,589
3
14
30

4

`ConfiProto` appears to be missing this parameter, thereby yielding an error `ValueError: Protocol message ConfigProto has no "allow_growth" field` – Oleg Melnikov Jan 07 '18 at 23:29
Possibly only applicable for TF1, version 2.1 and 2.2 give me the same error, but the answer of Jai Mahesh ( https://stackoverflow.com/users/11280106/jai-mahesh ) worked for me. Link to the answer: https://stackoverflow.com/a/59558128/4575793 – Cadoiz Jul 28 '20 at 11:10

score 4 · Answer 6 · answered Oct 31 '19 at 16:41

None of these fixes worked for me, as it seems that the structure of the tensorflow libraries have changed. For Tensorflow 2.0, the only fix that worked for me was as under Limiting GPU memory growth on this page https://www.tensorflow.org/guide/gpu

For completeness and future-proofing, here's the solution from the docs - I imagine changing memory_limit may be necessary for some people - 1 GB was fine for my case.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Thank you so much. This is the only solution that has worked. Don't forget to add 'import tensorflow as tf' if you haven't already when copying this code over. — Sam, May 22 '20 at 18:59

kett · Answer 7 · 2019-05-10T07:19:16.310

2

Tensorflow 2.0 alpha

Allowing GPU memory growth may fix this issue. For Tensorflow 2.0 alpha / nightly there are two methods you can try, to archive this.

1.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

I suggest you try both, and see if it helps. Source: https://www.tensorflow.org/alpha/guide/using_gpu

edited May 10 '19 at 07:19

answered May 08 '19 at 08:41

kett

877
3
9
22

I think you mean tf.config.gpu.set_per_process_memory_growth() – Axiverse May 09 '19 at 06:51
in tf.config.gpu.set_per_process_memory_growth() AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'gpu' – seilgu Sep 23 '19 at 03:42
1

@seilgu now that 2.0 is out of alpha, it's `tf.config.experimental.set_virtual_device_configuration(tf.config.experimental.list_physical_devices('GPU')[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])`: https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth – pentavalentcarbon Nov 14 '19 at 22:09

score 2 · Answer 8 · answered Nov 19 '19 at 11:10

2

for keras:

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

answered Nov 19 '19 at 11:10

Maverick Meerkat

5,737
3
47
66

Not working on for both (tested) tensorflow 2.1and 2.2 and giving this error: AttributeError: module 'tensorflow' has no attribute 'ConfigProto' – Cadoiz Jul 28 '20 at 11:12

score 2 · Answer 9 · answered Feb 15 '20 at 05:12

2

In my case, a stale python process was consuming memory. I killed it through task manager, and things are back to normal.

answered Feb 15 '20 at 05:12

winterlight

460
4
10

score 2 · Answer 10 · answered Feb 16 '21 at 06:35

A bit late to the party but this resolves my issue with tensorflow 2.4.0 and a gtx 980ti. Before limiting the memory I got an error like:

CUBLAS_STATUS_ALLOC_FAILED

My solution was this piece of code:

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

I found the solution here: https://www.tensorflow.org/guide/gpu

Tensorflow crashes with CUBLAS_STATUS_ALLOC_FAILED

10 Answers10

Linked