CUDA_ERROR_LAUNCH_FAILED with Tensorflow and Keras

Question

I'm using Keras to train a convolutional neural network using the fit_generator function as the images are stored in .h5 files and don't fit in memory. Most of the times I'm not able to train the model as it gets stuck in the middle of the first epoch, or it crashes saying 'GPU sync failed' or 'CUDA_ERROR_LAUNCH_FAILED' (see the logs below). The training using the CPUs works well but of course it is slower. I'm using two different machines and both have the same issues. My guess is that it is an installation/configuration related problem but I don't know how to fix it.

On both machines Tensorflow was installed as explained here: https://www.anaconda.com/blog/developer-blog/tensorflow-in-anaconda/

I have used this script https://github.com/tensorflow/tensorflow/blob/master/tools/tf_env_collect.sh to collect the following informations.

Here the tf_env.txt

First machine:

Keras 2.2.4. 

== cat /etc/issue ===============================================
Linux liph02.novalocal 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
VERSION="7 (Core)"
VERSION_ID="7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

== are we in docker =============================================
No

== compiler =====================================================
c++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


== uname -a =====================================================
Linux liph02.novalocal 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

== check pips ===================================================
numpy                    1.15.4    
numpydoc                 0.8.0     
protobuf                 3.6.1     
tensorflow               1.12.0    

== check for virtualenv =========================================
False

== tensorflow import ============================================
tf.VERSION = 1.12.0
tf.GIT_VERSION = b'unknown'
tf.COMPILER_VERSION = b'unknown'
Sanity check: array([1], dtype=int32)

== env ==========================================================
LD_LIBRARY_PATH /usr/local/cuda-9.2/lib64
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
Fri Dec 28 16:13:39 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78       Driver Version: 410.78       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:00:06.0 Off |                  N/A |
| 22%   38C    P0    57W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

== cuda libs  ===================================================
/usr/local/Wolfram/Mathematica/11.3/SystemFiles/Components/MXNetLink/LibraryResources/Linux-x86-64/libcudart.so.9.1


Second machine:

Keras 2.2.4.


== cat /etc/issue ===============================================
Linux liph01.novalocal 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
VERSION="7 (Core)"
VERSION_ID="7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

== are we in docker =============================================
No

== compiler =====================================================
c++ (GCC) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


== uname -a =====================================================
Linux liph01.novalocal 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

== check pips ===================================================
msgpack-numpy                      0.4.3.2    
numpy                              1.15.3     
numpydoc                           0.8.0      
protobuf                           3.6.0      
tensorflow                         1.11.0     

== check for virtualenv =========================================
False

== tensorflow import ============================================
tf.VERSION = 1.11.0
tf.GIT_VERSION = b'unknown'
tf.COMPILER_VERSION = b'unknown'

== env ==========================================================
LD_LIBRARY_PATH is unset
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
Thu Jan  3 17:38:44 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:00:07.0 Off |                  N/A |
| 40%   65C    P2    94W / 250W |  11747MiB / 12196MiB |     90%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     16991      C   python                                     11737MiB |
+-----------------------------------------------------------------------------+

== cuda libs  ===================================================
/usr/local/cuda-9.2/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-9.2/targets/x86_64-linux/lib/libcudart.so.9.2.148
/usr/local/cuda-9.2/doc/man/man7/libcudart.7
/usr/local/cuda-9.2/doc/man/man7/libcudart.so.7

Here the two stacktraces

(dev) -bash-4.2$ python classifier_training.py --dirs /data/simulations/Paranal_gam/ /data/simulations/Paranal_prot/ --epochs 1 --batch_size 32 --workers 16 --model ClassifierV2 --patience 1
Using TensorFlow backend.
ClassifierV2
Building training generator...
Building validation generator...
2018-12-18 12:15:19.553286: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2018-12-18 12:15:20.043811: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-18 12:15:20.047991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:00:06.0
totalMemory: 11.91GiB freeMemory: 11.75GiB
2018-12-18 12:15:20.048093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
Traceback (most recent call last):
  File "classifier_training.py", line 122, in <module>
    model = class_v2.get_model()
  File "/data/ctasoft/cta-lstchain/cnn/classifiers.py", line 40, in get_model
    self.model.add(Conv2D(16, kernel_size=(3, 3), input_shape=(1, self.img_rows, self.img_cols),  data_format='channels_first', activation='relu'))
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/engine/sequential.py", line 165, in add
    layer(x)
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/layers/convolutional.py", line 171, in call
    dilation_rate=self.dilation_rate)
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3641, in conv2d
    x, tf_data_format = _preprocess_conv2d_input(x, data_format)
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 3521, in _preprocess_conv2d_input
    if not _has_nchw_support() or force_transpose:
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 292, in _has_nchw_support
    gpus_available = len(_get_available_gpus()) > 0
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 278, in _get_available_gpus
    _LOCAL_DEVICES = get_session().list_devices()
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session
    _SESSION = tf.Session(config=config)
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/data/ctasoft/anaconda3/envs/cta-dev/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
    self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: unspecified launch failure


(dev) -bash-4.2$ python classifier_training.py --dirs /data/simulations/Paranal_gam /data/simulations/Paranal_prot --workers 1 --epochs 10 --batch_size 16 --model ClassifierV2 --patience 9
Using TensorFlow backend.
ClassifierV2
Building training generator...
Building validation generator...
2018-12-29 19:29:11.142008: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2018-12-29 19:29:11.892617: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
 NUMA node zero
2018-12-29 19:29:11.896828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:00:06.0
totalMemory: 11.91GiB freeMemory: 11.75GiB
2018-12-29 19:29:11.896880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-29 19:29:12.960736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-29 19:29:12.960804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2018-12-29 19:29:12.960819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2018-12-29 19:29:12.961681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11366 MB memory) -> physical GPU (device:
0, name: TITAN Xp, pci bus id: 0000:00:06.0, compute capability: 6.1)
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 16, 98, 98)        160
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 96, 96)        2320
_________________________________________________________________
average_pooling2d_1 (Average (None, 16, 48, 48)        0
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 48, 48)        0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 46, 46)        4640
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 32, 44, 44)        9248
_________________________________________________________________
average_pooling2d_2 (Average (None, 32, 22, 22)        0
_________________________________________________________________
dropout_2 (Dropout)          (None, 32, 22, 22)        0
_________________________________________________________________
flatten_1 (Flatten)          (None, 15488)             0
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1982592
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0
_________________________________________________________________
dense_2 (Dense)              (None, 256)               33024
_________________________________________________________________
dropout_4 (Dropout)          (None, 256)               0
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 257
=================================================================
Total params: 2,032,241
Trainable params: 2,032,241
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
   4/8065 [..............................] - ETA: 1:52:06 - loss: 0.9940 - acc: 0.4531 - precision: 0.4947 - recall: 0.71882018-12-29 19:29:54.459471: E tensorflow/stream_executor/cuda/cuda_event.cc:48] E
rror polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2018-12-29 19:29:54.459645: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1
Aborted

Have you googled that error and looked at the issues posted on github that Google reveals? — sebrockm, Jan 03 '19 at 17:46
Yes I tried to find something useful here https://github.com/tensorflow/tensorflow/issues?utf8=✓&q=CUDA_ERROR_LAUNCH_FAILED but I don't see anything that can help me. I think that the tf_env.txt is sufficient to understand the problem — user2102565, Jan 03 '19 at 18:16
@Andrey: This question has nothing to do with CUDA programming and should not be tagged with the CUDA tag. Please do not re-add the tag — talonmies, Jan 02 '21 at 08:05

score 0 · Answer 1 · answered Apr 30 '21 at 09:10

Looks like on the First machine, CUDA version mismatch,Make sure use single version of CUDA and on the second machine the variables of CUDA and cuDNN are not set properly. Follow the instructions mentioned on Tensorflow with GPU support. Also check the NVIDIA driver compute capability and install CUDA accordingly.

CUDA_ERROR_LAUNCH_FAILED with Tensorflow and Keras

1 Answers1

Linked