3

I am currently experimenting with fine tuning the VGG16 network using Keras.

I started tweaking a little bit with the cats and dogs dataset.

However, with the current configuration the training seems to block on the first epoch

from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential, Model
from keras.layers import Dropout, Flatten, Dense


img_width, img_height = 224, 224

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 20


model = applications.VGG16(weights='imagenet', include_top=False , input_shape=(224,224,3))
print('Model loaded.')


top_model = Sequential()
top_model.add(Flatten(input_shape=model.output_shape[1:]))
top_model.add(Dense(256, activation='relu',name='newlayer'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2, activation='softmax'))


model = Model(inputs= model.input, outputs= top_model(model.output))


for layer in model.layers[:19]:
    layer.trainable = False


model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(lr=0.0001),
              metrics=['accuracy'])

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')


model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples// batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples)

Last output:

Epoch 1/50 99/100 [============================>.] - ETA: 0s - loss: 0.5174 - acc: 0.7581

Am I missing something ?

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Lilo
  • 640
  • 1
  • 9
  • 22
  • 2
    How long did you wait? Its probably just using the validation generator to estimate the validation loss, which takes time. – Dr. Snoopy Feb 12 '18 at 15:03
  • I turns out that you are right. However, why using the validation generator take more time than the training itself (At my understating it is just predicting the validation set, no ?) – Lilo Feb 12 '18 at 15:10
  • That is probably because you should give validation_steps as number of samples / batch size. In this case its running the validation generator for longer than it has to. – Dr. Snoopy Feb 12 '18 at 16:03

5 Answers5

2

Shuffle

In my case, I was calling fit(...) with shuffle='batch'. Removing this parameter from the arguments resolved the problem. (I assume it's a TensorFlow bug but I didn't dig into it.)

Validation

Another consideration is that validation is being performed at the end of the epoch... If your validation data isn't being batched, and particularly if you are padding your data, then you could be performing validation on data much larger than your training batch size padded to the maximum sample length of your validation data. This could be a problem of out-of-memory proportions.

Eric McLachlan
  • 3,132
  • 2
  • 25
  • 37
0

I faced this problem in Co-Lab provides limited memory upto(12 GB) in cloud which creates many issues while solving a problem. That's why only 300 images are used to train and test.when images was preprocessed with dimension 600x600 and batch size was set to 128 it Keras model freezed during epoch 1 .Compiler did not show this error.Actually the error was runtime limited memory which was unable to handle by CoLab because it gave only 12GB limited memory for usage. Solution to above mentioned problem was solved by changing batch size to 4 and reduce image dimension to 300x300 because with 600x600 it still not work.
Conclusively,Recommend Solution is Make Images dimension and Batch_size small until you get no error Run Again and Again until there will no run time error

0

I faced the same issue. This is because the model is running on the validation dataset, and this usually takes a lot of time. Try reducing the validation dataset, or wait for some time it worked for me. It seems like it's stuck, but it is running on the validation dataset.

0

If you are using from tensorflow.keras.preprocessing.image import ImageDataGenerator, try changing it to from keras.preprocessing.image import ImageDataGenerator, or vice versa. Worked for me. Its said that you should never mix keras and tensorflow.

RonithSaju
  • 11
  • 2
0

I tried everything posted in here, but they didn't work for me. I found the solution by simply putting the validation set into a numpy.array like this:

numpy.array(validation_x)

Super simple. Works like a charm. I hope this helps someone.

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
skywalkerdk
  • 111
  • 1
  • 8