CNN keras model.fit() with GPU causes exit code -1073740791 (0xC0000409)

Question

Running my model.fit() function for my CNN results in:

Process finished with exit code **-1073740791 (0xC0000409)**

Installed:

nvidia cuda 11.5.1
nvidia cudnn 8.3.2.44
Tensorflow 2.7.0
python 3.9.9

Manually changing the device so it uses the CPU works, the model runs. Also if it identifies the GPU, tested with:

tf.test.is_gpu_available())
tf.config.experimental.list_physical_devices())

Some output of the logs before the model crashs:

gradient_tape/sequential/batch_normalization_2/moments/BroadcastGradientArgs/s1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
gradient_tape/sequential/flatten/Shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_2: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_3: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_4: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_5: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_6: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_7: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/add/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_1/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_2/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_3/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
ExpandDims_1/dim: (Const): /job:localhost/replica:0/task:0/device:GPU:0
ArgMax/dimension: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Size: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
batch_loss/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
batch_accuracy/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.048605: I tensorflow/core/common_runtime/placer.cc:114] gradient_tape/sequential/flatten/Shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.048845: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049069: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049295: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_2: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049521: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_3: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049748: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_4: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049973: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_5: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050199: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_6: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050424: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_7: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050638: I tensorflow/core/common_runtime/placer.cc:114] Adam/add/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050841: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051047: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_1/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051252: I tensorflow/core/common_runtime/placer.cc:114] Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051453: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_2/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051655: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_3/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051861: I tensorflow/core/common_runtime/placer.cc:114] Adam/Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052072: I tensorflow/core/common_runtime/placer.cc:114] ExpandDims_1/dim: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052283: I tensorflow/core/common_runtime/placer.cc:114] ArgMax/dimension: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052493: I tensorflow/core/common_runtime/placer.cc:114] Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052681: I tensorflow/core/common_runtime/placer.cc:114] Size: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052872: I tensorflow/core/common_runtime/placer.cc:114] Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.053087: I tensorflow/core/common_runtime/placer.cc:114] batch_loss/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.053358: I tensorflow/core/common_runtime/placer.cc:114] batch_accuracy/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.138176: I tensorflow/core/common_runtime/eager/execute.cc:1224] Executing op __inference_train_function_1479 in device /job:localhost/replica:0/task:0/device:GPU:0

Process finished with exit code -1073740791 (0xC0000409)

Also, I tested some basic matmul example with GPU and it worked. I think it might be some version conflict with cuda/cudnn. The solution from Tensorflow 2.5 exit code -1073740791 when GPU training is not working.

Requested output for:

import tensorflow as tf print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1

Thanks for any advice!

If you could provide some minimal code to reproduce the error and also let us know in which platform you are executing this code. — , Jan 04 '22 at 16:28
Hi, it's basically this code with some slight changes: https://www.kaggle.com/ahmedzakaria/image-classification-using-cnn-accuracy-0-84/notebook But I don't think it has something to do with the code itself. The code works fine while using the CPU. The Error just occurs when I want to run it with the GPU usage. I think it might be a problem with the Versions of the third-party-packages (CUDA/cudNN) or Tensorflow — Lucas Winkler, Jan 12 '22 at 11:00
Yes, [build configuration](https://www.tensorflow.org/install/source_windows#gpu) is not appropriate. It requires `CUDA 11.2` and `cuDNN 8.1` for your specified configuration of `Python 3.9` and `Tensorflow 2.7`. Follow the steps mentioned on Tensorflow site to install cuda for [Windows setup](https://www.tensorflow.org/install/gpu#windows_setup) — , Jan 12 '22 at 11:28
So I uninstalled the newer Version and installen CUDA 11.2 and cuDNN 8.1.1.33 Environmental variables are set as well. GPU Driver is updated. I still get the same error even though the GPU is detected if I activate the logs for debugging. I just added the logs to the question, maybe this helps. — Lucas Winkler, Jan 12 '22 at 15:16
Please provide the ouput of these code: `import tensorflow as tf` `print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))` — , Jan 13 '22 at 04:52
I did so. It recognizes the GPU. I also provided the output of the logs when I start the code. If I interpret it correctly it uses the GPU for some tasks till it exits the code. — Lucas Winkler, Jan 13 '22 at 10:49
Could you please check [this](https://stackoverflow.com/questions/50562192/process-finished-with-exit-code-1073740791-0xc0000409-pycharm-error') similar resolved issue as it says it is stack buffer overflow issue. — , May 20 '22 at 11:00

CNN keras model.fit() with GPU causes exit code -1073740791 (0xC0000409)

0 Answers0