Keras fit freezes at the end of the first epoch

Question

I am currently experimenting with fine tuning the VGG16 network using Keras.

I started tweaking a little bit with the cats and dogs dataset.

However, with the current configuration the training seems to block on the first epoch

from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential, Model
from keras.layers import Dropout, Flatten, Dense


img_width, img_height = 224, 224

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 20


model = applications.VGG16(weights='imagenet', include_top=False , input_shape=(224,224,3))
print('Model loaded.')


top_model = Sequential()
top_model.add(Flatten(input_shape=model.output_shape[1:]))
top_model.add(Dense(256, activation='relu',name='newlayer'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2, activation='softmax'))


model = Model(inputs= model.input, outputs= top_model(model.output))


for layer in model.layers[:19]:
    layer.trainable = False


model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(lr=0.0001),
              metrics=['accuracy'])

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')


model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples// batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples)

Last output:

Epoch 1/50 99/100 [============================>.] - ETA: 0s - loss: 0.5174 - acc: 0.7581

Am I missing something ?

How long did you wait? Its probably just using the validation generator to estimate the validation loss, which takes time. — Dr. Snoopy, Feb 12 '18 at 15:03
I turns out that you are right. However, why using the validation generator take more time than the training itself (At my understating it is just predicting the validation set, no ?) — Lilo, Feb 12 '18 at 15:10
That is probably because you should give validation_steps as number of samples / batch size. In this case its running the validation generator for longer than it has to. — Dr. Snoopy, Feb 12 '18 at 16:03

Eric McLachlan · Answer 1 · 2020-03-04T10:16:55.997

Shuffle

In my case, I was calling fit(...) with shuffle='batch'. Removing this parameter from the arguments resolved the problem. (I assume it's a TensorFlow bug but I didn't dig into it.)

Validation

Another consideration is that validation is being performed at the end of the epoch... If your validation data isn't being batched, and particularly if you are padding your data, then you could be performing validation on data much larger than your training batch size padded to the maximum sample length of your validation data. This could be a problem of out-of-memory proportions.

score 0 · Answer 2 · answered Sep 05 '18 at 19:02

I faced this problem in Co-Lab provides limited memory upto(12 GB) in cloud which creates many issues while solving a problem. That's why only 300 images are used to train and test.when images was preprocessed with dimension 600x600 and batch size was set to 128 it Keras model freezed during epoch 1 .Compiler did not show this error.Actually the error was runtime limited memory which was unable to handle by CoLab because it gave only 12GB limited memory for usage. Solution to above mentioned problem was solved by changing batch size to 4 and reduce image dimension to 300x300 because with 600x600 it still not work.
Conclusively,Recommend Solution is Make Images dimension and Batch_size small until you get no error Run Again and Again until there will no run time error

score 0 · Answer 3 · answered Feb 07 '21 at 11:15

I faced the same issue. This is because the model is running on the validation dataset, and this usually takes a lot of time. Try reducing the validation dataset, or wait for some time it worked for me. It seems like it's stuck, but it is running on the validation dataset.

score 0 · Answer 4 · answered Jul 09 '21 at 16:37

0

If you are using from tensorflow.keras.preprocessing.image import ImageDataGenerator, try changing it to from keras.preprocessing.image import ImageDataGenerator, or vice versa. Worked for me. Its said that you should never mix keras and tensorflow.

answered Jul 09 '21 at 16:37

RonithSaju

11
2

score 0 · Answer 5 · edited Jun 05 '22 at 00:13

0

I tried everything posted in here, but they didn't work for me. I found the solution by simply putting the validation set into a numpy.array like this:

numpy.array(validation_x)

Super simple. Works like a charm. I hope this helps someone.

edited Jun 05 '22 at 00:13

Jeremy Caney

7,102
69
48
77

answered Jun 04 '22 at 21:17

skywalkerdk

111
1
8

Keras fit freezes at the end of the first epoch

5 Answers5

Shuffle

Validation

Linked