23

Problem: I am training a model for multilabel image recognition. My images are therefore associated with multiple y labels. This is conflicting with the convenient keras method "flow_from_directory" of the ImageDataGenerator, where each image is supposed to be in the folder of the corresponding label (https://keras.io/preprocessing/image/).

Workaround: Currently, I am reading all images into a numpy array and use the "flow" function from there. But this results in heavy memory loads and a slow read-in process.

Question: Is there a way to use the "flow_from_directory" method and to supply manually the (multiple) class labels?


Update: I ended up extending the DirectoryIterator class for the multilabel case. You can now set the attribute "class_mode" to the value "multilabel" and provide a dictionary "multlabel_classes" which maps filenames to their labels. Code: https://github.com/tholor/keras/commit/29ceafca3c4792cb480829c5768510e4bdb489c5

Malte
  • 895
  • 2
  • 8
  • 16
  • 1
    flow_from_directory assumes that the images are split between directories and each directory's name is the target. The general idea of Keras is to simplify usage (versus TF and Theano) but it comes with the cost of lack of customization. You shouldn't load all images into memory, create instead directories that represent the various classes and store the corresponding images inside. Take a look at the very nice F. Chollet tutorial: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html – thecheech Mar 29 '17 at 08:10
  • 5
    Storing the images in directories that represent the various classes is not really feasible in the multilabel situation. With 100 classes and 1-6 classes per image the possible combinations are already huge. If there's no other workaround, I will probably extend the DirectoryIterator class in keras/preprocessing/image.py – Malte Mar 29 '17 at 08:20
  • Great fix. Have you created a pull request for this? I think it's something the maintainers might/should consider adding. – gaw89 Apr 03 '17 at 09:47
  • I have just created a pull request: https://github.com/fchollet/keras/pull/6128 – Malte Apr 03 '17 at 17:19

3 Answers3

12

You could simply use the flow_from_directory and extend it to a multiclass in a following manner:

def multiclass_flow_from_directory(flow_from_directory_gen, multiclasses_getter):
    for x, y in flow_from_directory_gen:
        yield x, multiclasses_getter(x, y)

Where multiclasses_getter is assigning a multiclass vector / your multiclass representation to your images. Note that x and y are not a single examples but batches of examples, so this should be included in your multiclasses_getter design.

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
3

You could write a custom generator class that would read the files in from the directory and apply the labeling. That custom generator could also take in an ImageDataGenerator instance which would produce the batches using flow().

I am imagining something like this:

class Generator():

    def __init__(self, X, Y, img_data_gen, batch_size):
        self.X = X
        self.Y = Y  # Maybe a file that has the appropriate label mapping?
        self.img_data_gen = img_data_gen  # The ImageDataGenerator Instance
        self.batch_size = batch_size

    def apply_labels(self):
        # Code to apply labels to each sample based on self.X and self.Y

    def get_next_batch(self):
        """Get the next training batch"""
        self.img_data_gen.flow(self.X, self.Y, self.batch_size)

Then simply:

img_gen = ImageDataGenerator(...)
gen = Generator(X, Y, img_gen, 128)

model.fit_generator(gen.get_next_batch(), ...)

*Disclaimer: I haven't actually tested this, but it should work in theory.

gaw89
  • 1,018
  • 9
  • 19
  • You might also need while True:... under the get_next_batch() method because a generator is expected to provide data in an infinite loop. – gaw89 Mar 30 '17 at 15:06
0
# Training the model
history = model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=3, validation_data=val_generator,validation_steps=validation_steps, verbose=1,
                    callbacks= keras.callbacks.ModelCheckpoint(filepath='/content/results',monitor='val_accuracy', save_best_only=True,save_weights_only=False))

The validation_steps or the steps_per_epoch might be exceeding than that of the original parameters.

steps_per_epoch= (int(num_of_training_examples/batch_size) might help. Similarly validation_steps= (int(num_of_val_examples/batch_size) will help

Tonechas
  • 13,398
  • 16
  • 46
  • 80
  • 1
    Welcome to Stack Overflow. Please refer to [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). In particular, take care to format your code sections properly and explain how the included code answers/solves OP's question. – Ivo Mori Aug 03 '20 at 05:26