Problem with tensorflow 2.0 gpu, unknown error

Question

I'm trying to run tensorflow-gpu 2.0 on Windows 10 in a conda environment, the code is actually the basic tutorial on TensorFlow page

from __future__ import absolute_import, division, print_function, unicode_literals 
import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test,  y_test, verbose=2)

and I don't understand the error and have already uninstalled and installed again could it be that I have not installed yet keras-gpu?, I am just getting started with this library pls help :(

Epoch 1/5
2020-01-24 23:40:35.430377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-01-24 23:40:35.923375: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.933612: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.941088: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.952234: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.961783: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.970378: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.976378: W tensorflow/stream_executor/stream.cc:1919] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-01-24 23:40:35.986426: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
         [[{{node sequential/dense/MatMul}}]]
   32/60000 [..............................] - ETA: 2:37:06Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 511, in call
    ctx=ctx)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError:  Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
         [[node sequential/dense/MatMul (defined at C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_distributed_function_706]

Function call stack:
distributed_function

>>>
>>> model.evaluate(x_test,  y_test, verbose=2)
2020-01-24 23:40:36.878248: I tensorflow/stream_executor/stream.cc:1868] [stream=000002DA3ACFDB20,impl=000002DA3B9C8060] did not wait for [stream=000002DA3ACFD9A0,impl=000002DA3B9C7F70]
2020-01-24 23:40:36.892612: I tensorflow/stream_executor/stream.cc:4816] [stream=000002DA3ACFDB20,impl=000002DA3B9C8060] did not memcpy host-to-device; source: 000002DAA3AF8C80
2020-01-24 23:40:36.901014: F tensorflow/core/common_runtime/gpu/gpu_util.cc:342] CPU->GPU Memcpy failed```

Try updating all the packages for the conda environment. Use `conda update --all`. — Saket Khandelwal, Jan 25 '20 at 06:21

craba · Answer 1 · 2020-01-25T07:54:14.750

Igor, are you setting the GPU device?

https://devblogs.nvidia.com/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/

https://www.tensorflow.org/guide/gpu

see what devices you have:

from \__future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))





## it's possible to set the device manually

tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
    print(c)

so your code would be something like:

with tf.device('/CPU:0'):
    mnist = tf.keras.datasets.mnist

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation='softmax')
        ])

    model.compile(optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=5)

    model.evaluate(x_test,  y_test, verbose=2)

thanks a lot! Looks like I was missing the device placement, as you suggested. If I tried to change the "with tf.device('/CPU:0'):" to "with tf.device('/GPU:0'):" that would automatically select the Nvidia GPU? or the Intel GPU? — Igor Ruiz Rojas, Jan 25 '20 at 19:00
well it is going to be specific with your machine, depending on the way your system is built. Maybe check out this other post to find how to check what device is what https://stackoverflow.com/questions/38559755/how-to-get-current-available-gpus-in-tensorflow . Once you know what device number you prefer, substitue that into the code ('/CPU:0') or ('/CPU:1'), or whatever makes sense — craba, Jan 26 '20 at 02:32

Problem with tensorflow 2.0 gpu, unknown error

1 Answers1

see what devices you have:

so your code would be something like: