53

I'm running tensorflow-gpu on Windows 10 using a simple MINST neural network program. When it tries to run, it encounters a CUBLAS_STATUS_ALLOC_FAILED error. A google search doesn't turn up anything.

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Axiverse
  • 1,589
  • 3
  • 14
  • 30

10 Answers10

49

For TensorFlow 2.2 none of the other answers worked when the CUBLAS_STATUS_ALLOC_FAILED problem was encountered. Found a solution on https://www.tensorflow.org/guide/gpu:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

I ran this code before any further calculations are made and found that the same code that produced CUBLAS error before now worked in same session. The sample code above is a specific example that sets the memory growth across a number of physical GPUs but it also solves the memory expansion problem.

Snympi
  • 859
  • 13
  • 18
28

The location of the "allow_growth" property of the session config seems to be different now. It's explained here: https://www.tensorflow.org/tutorials/using_gpu

So currently you'd have to set it like this:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
Cadoiz
  • 1,446
  • 21
  • 31
Rafal Zajac
  • 1,613
  • 1
  • 16
  • 13
21

tensorflow>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)
Eddie Parker
  • 4,770
  • 3
  • 35
  • 43
Welcome_back
  • 1,245
  • 12
  • 18
  • 1
    Working with tf 2.1.0, windows 10, 16GB RAM, RTX 2070 Max Q 8GB, but I changed the value to 0.5 – yeeking May 03 '20 at 11:23
  • Working too on this machine too: https://www.userbenchmark.com/UserRun/30694804 But it gives me `The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.` I think, this should be included in the answer, but the author should decide. – Cadoiz Jul 28 '20 at 11:56
  • Worked on 2020-Nov on Windows x64 under Python 3.75 with Cuda 10.1 and TensorFlow 2.3 on a RTX 2080 Ti. – Contango Nov 18 '20 at 19:11
  • Working for me with fraction 0.8. Using TensorFlow 2.4.0, windows 10 on RTX 2060. – Rahat Zaman Jan 12 '21 at 07:32
  • Worked for me but ending up not using much of the GPU memory (<< 0.8). I had better results with the solution by @Anxifer. TensorFlow 2.4.1. – Kenneth Evans Mar 21 '21 at 19:39
10

I found this solution works

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)
Space Bear
  • 755
  • 1
  • 6
  • 18
  • Not working on for both (tested) `tensorflow 2.1`and `2.2` and giving this error: `AttributeError: module 'tensorflow' has no attribute 'ConfigProto'` – Cadoiz Jul 28 '20 at 11:04
4

On windows, currently tensorflow does not allocate all available memory like it says in the documentation, instead you can work around this error by allowing dynamic memory growth as follows:

tf.Session(config=tf.ConfigProto(allow_growth=True))
Axiverse
  • 1,589
  • 3
  • 14
  • 30
  • 4
    `ConfiProto` appears to be missing this parameter, thereby yielding an error `ValueError: Protocol message ConfigProto has no "allow_growth" field` – Oleg Melnikov Jan 07 '18 at 23:29
  • Possibly only applicable for TF1, version 2.1 and 2.2 give me the same error, but the answer of Jai Mahesh ( https://stackoverflow.com/users/11280106/jai-mahesh ) worked for me. Link to the answer: https://stackoverflow.com/a/59558128/4575793 – Cadoiz Jul 28 '20 at 11:10
4

None of these fixes worked for me, as it seems that the structure of the tensorflow libraries have changed. For Tensorflow 2.0, the only fix that worked for me was as under Limiting GPU memory growth on this page https://www.tensorflow.org/guide/gpu

For completeness and future-proofing, here's the solution from the docs - I imagine changing memory_limit may be necessary for some people - 1 GB was fine for my case.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)
carthurs
  • 553
  • 1
  • 5
  • 18
  • Thank you so much. This is the only solution that has worked. Don't forget to add 'import tensorflow as tf' if you haven't already when copying this code over. – Sam May 22 '20 at 18:59
2

Tensorflow 2.0 alpha

Allowing GPU memory growth may fix this issue. For Tensorflow 2.0 alpha / nightly there are two methods you can try, to archive this.

1.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

I suggest you try both, and see if it helps. Source: https://www.tensorflow.org/alpha/guide/using_gpu

kett
  • 877
  • 3
  • 9
  • 22
  • I think you mean tf.config.gpu.set_per_process_memory_growth() – Axiverse May 09 '19 at 06:51
  • in tf.config.gpu.set_per_process_memory_growth() AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'gpu' – seilgu Sep 23 '19 at 03:42
  • 1
    @seilgu now that 2.0 is out of alpha, it's `tf.config.experimental.set_virtual_device_configuration(tf.config.experimental.list_physical_devices('GPU')[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])`: https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth – pentavalentcarbon Nov 14 '19 at 22:09
2

for keras:

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)
Maverick Meerkat
  • 5,737
  • 3
  • 47
  • 66
  • Not working on for both (tested) tensorflow 2.1and 2.2 and giving this error: AttributeError: module 'tensorflow' has no attribute 'ConfigProto' – Cadoiz Jul 28 '20 at 11:12
2

In my case, a stale python process was consuming memory. I killed it through task manager, and things are back to normal.

winterlight
  • 460
  • 4
  • 10
2

A bit late to the party but this resolves my issue with tensorflow 2.4.0 and a gtx 980ti. Before limiting the memory I got an error like:

CUBLAS_STATUS_ALLOC_FAILED

My solution was this piece of code:

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

I found the solution here: https://www.tensorflow.org/guide/gpu

Anxifer
  • 115
  • 1
  • 2
  • 9