Keras multiclass classificator error with validation data and validation labels: Input arrays should have the same number of samples as target arrays

Question

I am trying to build a multi-class classificator for 24 classes with Keras using VGG16 bottleneck features and a small fully connected model on top.

At first I was trying to follow this tutorial: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html adapting it to multiclass, then I got the error and tried to use this other tutorial's code: http://www.codesofinterest.com/2017/08/bottleneck-features-multi-class-classification-keras.html and got the exact same error. I cannot figure out what the problem is!

The error I get is : " ValueError: Input arrays should have the same number of samples as target arrays. Found 12768 input samples and 12782 target samples. "

Basically I have two folders, train and validation. The train folder has 52992 png images, the validation folder as 12782 png images. My batch size is 16.

Here is the code in save_bottleneck_features() where I save the validation data (this function is called before the train_top_model() function):

generator = datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False)

nb_validation_samples = len(generator.filenames)

predict_size_validation = int(
    math.ceil(nb_validation_samples / batch_size))


bottleneck_features_validation = model.predict_generator(
    generator, predict_size_validation)

np.save('bottleneck_features_validation.npy',
        bottleneck_features_validation)

And here is the code in train_top_model() where I calculate the validation labels:

generator_top = datagen_top.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode= 'categorical',
    shuffle=False)

nb_validation_samples = len(generator_top.filenames)

validation_data = np.load('bottleneck_features_validation.npy')

validation_labels = generator_top.classes
validation_labels = np.array(
     [0] * (nb_validation_samples / 2) + [1] * (nb_validation_samples / 2))
validation_labels = to_categorical(
   validation_labels, num_classes=num_classes)

print predict_size_validation prints 798
print nb_validation_samples prints 12782
print len(validation_data) prints 12768
print len(validation_labels) prints 12782

Train data and train labels are calculated in the same way but they are OK.

I think that maybe the problem is with predict_size_validation, and that 12782 is not divisible by 16.

Thank you!!!

When I run `int(math.ceil(12782 / 16))`, I get as a result 799 instead of 798. Is it possible that you used some different version of your code? — Mr Tsjolder from codidact, Nov 15 '17 at 10:48
No I am sure this is the code. I tried printing nb_validation_samples and it is 12782 ! I also tried to print `print int(math.ceil((12782/16)))` and it prints 798 — MaggieD, Nov 15 '17 at 11:03
That is very odd. After all, `12782 / 16` is exactly 798.875 and thus `math.ceil` of this number should give 799 and not 798! — Mr Tsjolder from codidact, Nov 15 '17 at 13:05

score 1 · Accepted Answer · answered Nov 15 '17 at 13:18

In python 2, which I assume you are using given the comments, the division of two integers gives the integer division by default. This means that 12782 / 16 == 798 (in python 3, this is equivalent to 12782 // 16) instead of 12782 / 16 == 798.875 as is the case in python 3.

In order to resolve this issue, you should make sure that one of the numbers in the division is a float in order to get the correct behaviour, e.g.

import math

predict_size_validation = int(math.ceil(nb_validation_samples / float(batch_size)))

Alternatively, you can use the __future__ module to get python 3 behaviour, i.e.

import math
from __future__ import division

predict_size_validation = int(math.ceil(nb_validation_samples / batch_size))

Yet another solution is to rely on the integer division to do the computation (instead of relying on math.ceil):

predict_size_validation = nb_validation_samples // batch_size
if nb_validation_samples % batch_size != 0:
    predict_size_validation += 1

For more information on the python 2 floating point division, see this answer

Thanks! Now the result is 799, I will run the script again and see if it works. Saving the bottleneck features takes a little time, I will let you know how it goes! — MaggieD, Nov 15 '17 at 14:35

Keras multiclass classificator error with validation data and validation labels: Input arrays should have the same number of samples as target arrays

1 Answers1