Python Tensorflow model does not start training and just exits silently

Question

I am working on a piece of code that works on my laptop but not my desktop. It is a simple CNN model. I run the script and it throws this message:

    2021-12-29 12:01:24.246746: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-29 12:01:25.010753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9618 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
Epoch 1/55
2021-12-29 12:01:30.997154: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8301

Process finished with exit code -1073740791 (0xC0000409)

It looks like it is about to train but then suddenly exits gracefully. I have tf 2.7 installed, CUDA, and CUDNN. It can obviously read my GPU and works fine on the laptop. I even allocated memory to pycharm and it has plenty. What is wrong?

Code:

from Pipeline.Preprocess import Preprocess_dog_cat as procDogCat
from Pipeline.Models import Model_dog_cat as modelDogCat


# Model class
class CatDogModel:
    def __init__(self, version, model_name, data_dir, saved_weights_dir):
        self.data_dir = data_dir
        self.version_model_name = f'{version}_{model_name}'

        self.model_saved_weights_dir = f'{saved_weights_dir}\\{self.version_model_name}'
        self.log_dir = f'Model-Graphs&Logs\\Model-Data_{model_name}\\Logs\\{self.version_model_name}'

        self.train_gen = None
        self.valid_gen = None
        self.test_gen = None

        self.model = None
        self.history = None

    # Data Preprocessing
    def preprocess(self):
        self.train_gen = procDogCat.train_image_gen(self.data_dir)
        self.valid_gen = procDogCat.valid_image_gen(self.data_dir)
        self.test_gen = procDogCat.test_image_gen(self.data_dir)

    # Model Declaration
    def model_init(self):
        self.model = modelDogCat.seq_maxpool_cnn(self.log_dir)

    # Training
    def training(self):
        callback_list = []

        self.history = self.model.fit(self.train_gen, validation_data=self.valid_gen, batch_size=32,
                                      steps_per_epoch=20, epochs=55, callbacks=callback_list)


# Executor
if __name__ == '__main__':
    model_instance = CatDogModel(version='f_beta_test', model_name='dog_cat',
                                 data_dir='D:\\Data-Warehouse\\Dog-Cat-Data\\training_dir',
                                 saved_weights_dir='D:\\Saved-Models\\Dog-Cat-Models')
    model_instance.preprocess()
    model_instance.model_init()
    model_instance.training()

This might help: https://stackoverflow.com/questions/69597944/tensorflow-2-5-exit-code-1073740791-when-gpu-training?rq=1 — obr, Dec 29 '21 at 19:43
0xC0000409 means 'stack buffer overflow'. Please follow [this](https://stackoverflow.com/questions/50562192/process-finished-with-exit-code-1073740791-0xc0000409-pycharm-error) reference to solve this issue. — , Jan 10 '22 at 09:31

Python Tensorflow model does not start training and just exits silently

0 Answers0