0

I wish to use different target vectors (not the standard one-hot encoded) for training my CNN. My image data lies in 10 different folders (10 different categories). How do I use my desired target vectors? The flow_from_directory() outputs a one-hot encoded array of labels. I have the label vectors stored in a dictionary. Also, the names of the folders are the labels, if that helps.

today
  • 32,602
  • 8
  • 95
  • 115
xerxes01
  • 115
  • 1
  • 10
  • Could you provide some code of what you have tried? – Ryan Morton Jun 18 '18 at 21:12
  • You can easily wrap the `ImageDataGenerator` inside another function and manipulate the target vector generated. Let me know if you don't know how to do that. Note that this approach only works if your customized target vectors can be inferred from the category of the image or content of the image. – today Jun 18 '18 at 21:13
  • @today I am unable to write that custom function. Help would be highly appreciated! – xerxes01 Jun 18 '18 at 21:16
  • @xerxes01 Did you read my updated comment? Let me explain more: this approach works if all the images in category 1 have a target vector of say `[2, 9.8, 19, 78]`, or by analyzing the content of image you can generate its target vector. Is it the case? – today Jun 18 '18 at 21:20
  • @today yep that's exactly the case, all images in a certain class/folder have the same target vector. – xerxes01 Jun 18 '18 at 21:32

1 Answers1

0

Well as you may know the ImageDataGenerator in Keras is a python generator (if you are not familiar with python generators you can read more about them here). Since you want to use customized target vectors (and not the ones generated from flow_from_directory()) you can manipulate the behavior of image generator by wrapping it inside another function. Here is how:

First we need to store our custom targets as a numpy array:

# a numpy array containing the custom targets for each class
# custom_target[0] is target vector of class #1 images
# custom_target[1] is target vector of class #2 images
# etc.
custom_targets = your_custom_targets

Secondly, we create an image generator as usual and use the flow_from_directory to read images from disk. You need to set class_mode argument to 'sparse' to obtain the class index of each image. Further, you can set classes argument to a list containing the name of classes (i.e. directories). If you don't set this argument the order of the classes, which will map to the label indices, will be alphanumeric (i.e. 0 for class with the highest in alphabetical order, and so on):

train_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='sparse') # NOTE: set class_mode to sparse to generate integer indices

(Note: If you don't set classes argument, make sure that custom_target[i] corresponds to the i-th class in alphabetical order.)

Now we can wrap our generator inside another function and generate batches of images and their corresponding numeric labels which we use to generate our own labels:

def custom_generator(generator):
    for data, labels in generator:
        # get the custom labels corresponding to each class
        custom_labels = custom_targets[labels]
        yield data, custom_labels

And that's it! Now we have a custom generator that we can pass it to fit_generator (or predict_generator or evaluate_generator for inference time) like any other generator:

model.fit_generator(custom_generator(train_generator), # the rest of args)
today
  • 32,602
  • 8
  • 95
  • 115