Running my model.fit()
function for my CNN results in:
Process finished with exit code **-1073740791 (0xC0000409)**
Installed:
- nvidia cuda 11.5.1
- nvidia cudnn 8.3.2.44
- Tensorflow 2.7.0
- python 3.9.9
Manually changing the device so it uses the CPU works, the model runs. Also if it identifies the GPU, tested with:
tf.test.is_gpu_available())
tf.config.experimental.list_physical_devices())
Some output of the logs before the model crashs:
gradient_tape/sequential/batch_normalization_2/moments/BroadcastGradientArgs/s1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
gradient_tape/sequential/flatten/Shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_2: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_3: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_4: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_5: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_6: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/gradients/zeros_7: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/add/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_1/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_2/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/sub_3/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Adam/Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
ExpandDims_1/dim: (Const): /job:localhost/replica:0/task:0/device:GPU:0
ArgMax/dimension: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Size: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
batch_loss/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
batch_accuracy/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.048605: I tensorflow/core/common_runtime/placer.cc:114] gradient_tape/sequential/flatten/Shape: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.048845: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049069: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049295: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_2: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049521: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_3: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049748: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_4: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.049973: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_5: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050199: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_6: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050424: I tensorflow/core/common_runtime/placer.cc:114] Adam/gradients/zeros_7: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050638: I tensorflow/core/common_runtime/placer.cc:114] Adam/add/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.050841: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051047: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_1/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051252: I tensorflow/core/common_runtime/placer.cc:114] Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051453: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_2/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051655: I tensorflow/core/common_runtime/placer.cc:114] Adam/sub_3/x: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.051861: I tensorflow/core/common_runtime/placer.cc:114] Adam/Adam/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052072: I tensorflow/core/common_runtime/placer.cc:114] ExpandDims_1/dim: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052283: I tensorflow/core/common_runtime/placer.cc:114] ArgMax/dimension: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052493: I tensorflow/core/common_runtime/placer.cc:114] Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052681: I tensorflow/core/common_runtime/placer.cc:114] Size: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.052872: I tensorflow/core/common_runtime/placer.cc:114] Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.053087: I tensorflow/core/common_runtime/placer.cc:114] batch_loss/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.053358: I tensorflow/core/common_runtime/placer.cc:114] batch_accuracy/write_summary/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2022-01-12 16:15:13.138176: I tensorflow/core/common_runtime/eager/execute.cc:1224] Executing op __inference_train_function_1479 in device /job:localhost/replica:0/task:0/device:GPU:0
Process finished with exit code -1073740791 (0xC0000409)
Also, I tested some basic matmul
example with GPU and it worked.
I think it might be some version conflict with cuda/cudnn
.
The solution from Tensorflow 2.5 exit code -1073740791 when GPU training is not working.
Requested output for:
import tensorflow as tf print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Num GPUs Available: 1
Thanks for any advice!