0

Running my model.fit() function for my CNN results in:

Process finished with exit code **-1073740791 (0xC0000409)**

Installed:

  • nvidia cuda 11.5.1
  • nvidia cudnn 8.3.2.44
  • Tensorflow 2.7.0
  • python 3.9.9

Manually changing the device so it uses the CPU works, the model runs. Also if it identifies the GPU, tested with:

tf.test.is_gpu_available())
tf.config.experimental.list_physical_devices())

Some output of the logs before the model crashs:

gradient_tape/sequential/batch_normalization_2/moments/BroadcastGradientArgs/s1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
gradient_tape/sequential/flatten/Shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_2: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_3: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_4: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_5: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_6: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_7: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/add/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_1/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_2/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_3/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
ExpandDims_1/dim: (Const): /job:localhost/replica:0/task:0/device:GPU:0
ArgMax/dimension: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Size: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
batch_loss/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
batch_accuracy/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.048605: I tensorflow/core/common_runtime/placer.cc:114] gradient_tape/sequential/flatten/Shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.048845: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049069: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049295: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_2: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049521: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_3: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049748: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_4: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049973: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_5: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050199: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_6: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050424: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_7: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050638: I tensorflow/core/common_runtime/placer.cc:114] Adam/add/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050841: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051047: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_1/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051252: I tensorflow/core/common_runtime/placer.cc:114] Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051453: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_2/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051655: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_3/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051861: I tensorflow/core/common_runtime/placer.cc:114] Adam/Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052072: I tensorflow/core/common_runtime/placer.cc:114] ExpandDims_1/dim: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052283: I tensorflow/core/common_runtime/placer.cc:114] ArgMax/dimension: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052493: I tensorflow/core/common_runtime/placer.cc:114] Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052681: I tensorflow/core/common_runtime/placer.cc:114] Size: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052872: I tensorflow/core/common_runtime/placer.cc:114] Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.053087: I tensorflow/core/common_runtime/placer.cc:114] batch_loss/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.053358: I tensorflow/core/common_runtime/placer.cc:114] batch_accuracy/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.138176: I tensorflow/core/common_runtime/eager/execute.cc:1224] Executing op __inference_train_function_1479 in device /job:localhost/replica:0/task:0/device:GPU:0

Process finished with exit code -1073740791 (0xC0000409)

Also, I tested some basic matmul example with GPU and it worked. I think it might be some version conflict with cuda/cudnn. The solution from Tensorflow 2.5 exit code -1073740791 when GPU training is not working.

Requested output for:

import tensorflow as tf print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1

Thanks for any advice!

last log before exit

Lucas Winkler
  • 51
  • 1
  • 8
  • If you could provide some minimal code to reproduce the error and also let us know in which platform you are executing this code. –  Jan 04 '22 at 16:28
  • Hi, it's basically this code with some slight changes: https://www.kaggle.com/ahmedzakaria/image-classification-using-cnn-accuracy-0-84/notebook But I don't think it has something to do with the code itself. The code works fine while using the CPU. The Error just occurs when I want to run it with the GPU usage. I think it might be a problem with the Versions of the third-party-packages (CUDA/cudNN) or Tensorflow – Lucas Winkler Jan 12 '22 at 11:00
  • Yes, [build configuration](https://www.tensorflow.org/install/source_windows#gpu) is not appropriate. It requires `CUDA 11.2` and `cuDNN 8.1` for your specified configuration of `Python 3.9` and `Tensorflow 2.7`. Follow the steps mentioned on Tensorflow site to install cuda for [Windows setup](https://www.tensorflow.org/install/gpu#windows_setup) –  Jan 12 '22 at 11:28
  • So I uninstalled the newer Version and installen CUDA 11.2 and cuDNN 8.1.1.33 Environmental variables are set as well. GPU Driver is updated. I still get the same error even though the GPU is detected if I activate the logs for debugging. I just added the logs to the question, maybe this helps. – Lucas Winkler Jan 12 '22 at 15:16
  • Please provide the ouput of these code: `import tensorflow as tf` `print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))` –  Jan 13 '22 at 04:52
  • I did so. It recognizes the GPU. I also provided the output of the logs when I start the code. If I interpret it correctly it uses the GPU for some tasks till it exits the code. – Lucas Winkler Jan 13 '22 at 10:49
  • Could you please check [this](https://stackoverflow.com/questions/50562192/process-finished-with-exit-code-1073740791-0xc0000409-pycharm-error') similar resolved issue as it says it is stack buffer overflow issue. –  May 20 '22 at 11:00

0 Answers0