My fit function is breaking after running the first epoch

Question

I am new to ML so please go easy on me, I might be missing something simple but such is the case with programming in general. I did a course on Freecodecamp.com for Machine Learning in Python and I'm now doing one of the examples involving CNNs, which is supposed to train the model to detect whether the incoming image contains either a cat or a dog. I finally got my model working today with a 75% accuracy, but I wasn't sure if it was using the validation data correctly, because somewhere along the journey I chose to set my validation classes as classes=['.'] and that's when it was working (see below).

val_data_gen = validation_image_generator.flow_from_directory(
    validation_dir, 
    target_size=(IMG_HEIGHT,IMG_WIDTH),
    batch_size=batch_size,
    classes=classes, #classes=['.'] worked before idk why..
    class_mode="categorical")

Thereafter, I noticed that and fixed it so that my validation data has the correct classes, but now my fit function will run for exactly one epoch everytime and throw this exception below (summarized):

50   try:
     51     ctx.ensure_initialized()
---> 52     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     53                                         inputs, attrs, num_outputs)
     54   except core._NotOkStatusException as e:

InvalidArgumentError: Graph execution error:

classes is defined as classes = ["cats", "dogs"] and my fit function is below for reference:

history = model.fit(
    train_data_gen,
    validation_data=(val_data_gen, classes),
    epochs=epochs,
    batch_size=batch_size,
    validation_steps=len(val_data_gen)
)

Here is the Google Collab link if you would like to see a little more detail into it.

I have tried passing classes to the validation_data parameter (as the validation labels)

history = model.fit(
    train_data_gen,
    validation_data=(val_data_gen, classes),
    epochs=epochs,
    batch_size=batch_size,
    validation_steps=len(val_data_gen)
)

I have tried to see if it can work without passing the classes, such as validation_data=val_data_gen. I have tried changing the last dense layer value to 1: model.add(layers.Dense(1)), but I know that's wrong because I have 2 categories/classes, and I believe I got the same result in the end. I have also tried adding/removing the batch_size, and validation_steps parameters, according to other StackOverflow questions but the only other one I found on this website, was a person that was passing the wrong value in the Dense parameters which does not seem to be my problem.

This is my model structure:

> model = Sequential() 
> model.add(keras.Input(shape=(IMG_HEIGHT,IMG_WIDTH, 3)))
> model.add(layers.Conv2D(32, (3,3), activation='relu'))
> model.add(layers.MaxPooling2D(2,2)) 
> model.add(layers.Conv2D(64, (3,3),activation='relu'))
> model.add(layers.MaxPooling2D(2,2))
> model.add(layers.Conv2D(64, (3,3), activation='relu'))
> model.add(layers.MaxPooling2D(2,2)) 
> model.add(layers.Flatten())
> model.add(layers.Dense(64, activation='relu'))
> model.add(layers.Dense(2))

and this is my compile method:

> model.compile(optimizer='adam',
>        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
>               metrics=['accuracy'])

Thank you all for your time and patience, and let me know if you want me to try anything.

What optimizer are you using? Can you show your `compile()` method? And model structure as well. — NotAName, Feb 20 '23 at 01:35
Hello Pavel, I just updated the question to include the compile method and the model structure as requested. I'm using the adam optimizer — Sami Haddad, Feb 20 '23 at 01:52
So you're using `from_logits=True`, this means that your loss function expects a tensor of logits, but your final Dense layer returns a simple probability distribution. So either remove `from_logits` parameter or use sigmoid activation for your final Dense layer. — NotAName, Feb 20 '23 at 02:12
I tried both suggestions, but both fail on me before even running the first epoch. — Sami Haddad, Feb 20 '23 at 02:44

score 1 · Answer 1 · answered Feb 20 '23 at 03:46

After Pavel's suggestions, my code started crashing before starting the first epoch, but I got a more detailed error message: logits and labels must have the same first dimension. To solve my problem, I looked this error up and found a solution on SO. I had to change my function from SpareCategoricalCrossentropy to CategoricalCrossentropy and it works now. I run my fit function again and I'm now getting an 85% accuracy. Hope this helps someone.

My fit function is breaking after running the first epoch

1 Answers1