I am trying to build a multi-class classificator for 24 classes with Keras using VGG16 bottleneck features and a small fully connected model on top.
At first I was trying to follow this tutorial: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html adapting it to multiclass, then I got the error and tried to use this other tutorial's code: http://www.codesofinterest.com/2017/08/bottleneck-features-multi-class-classification-keras.html and got the exact same error. I cannot figure out what the problem is!
The error I get is : " ValueError: Input arrays should have the same number of samples as target arrays. Found 12768 input samples and 12782 target samples. "
Basically I have two folders, train and validation. The train folder has 52992 png images, the validation folder as 12782 png images. My batch size is 16.
Here is the code in save_bottleneck_features()
where I save the validation data (this function is called before the train_top_model()
function):
generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
nb_validation_samples = len(generator.filenames)
predict_size_validation = int(
math.ceil(nb_validation_samples / batch_size))
bottleneck_features_validation = model.predict_generator(
generator, predict_size_validation)
np.save('bottleneck_features_validation.npy',
bottleneck_features_validation)
And here is the code in train_top_model()
where I calculate the validation labels:
generator_top = datagen_top.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode= 'categorical',
shuffle=False)
nb_validation_samples = len(generator_top.filenames)
validation_data = np.load('bottleneck_features_validation.npy')
validation_labels = generator_top.classes
validation_labels = np.array(
[0] * (nb_validation_samples / 2) + [1] * (nb_validation_samples / 2))
validation_labels = to_categorical(
validation_labels, num_classes=num_classes)
print predict_size_validation
prints 798
print nb_validation_samples
prints 12782
print len(validation_data)
prints 12768
print len(validation_labels)
prints 12782
Train data and train labels are calculated in the same way but they are OK.
I think that maybe the problem is with predict_size_validation
, and that 12782 is not divisible by 16.
Thank you!!!