19

I have an imbalanced multi-class dataset and I want to use the class_weight argument from fit_generator to give weights to the classes according to the number of images of each class. I'm using ImageDataGenerator.flow_from_directory to load the dataset from a directory.

Is it possible to directly infer the class_weight argument from the ImageDataGenerator object?

Fábio Perez
  • 23,850
  • 22
  • 76
  • 100

7 Answers7

40

Just figured out a way of achieving this.

from collections import Counter
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(...)

counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}                     

model.fit_generator(...,
                    class_weight=class_weights)

train_generator.classes is a list of classes for each image. Counter(train_generator.classes) creates a counter of the number of images in each class.

Note that these weights may not be good for convergence, but you can use it as a base for other type of weighting based on occurrence.

This answer was inspired by: https://github.com/fchollet/keras/issues/1875#issuecomment-273752868

Fábio Perez
  • 23,850
  • 22
  • 76
  • 100
  • but train_generator.classes only returns a list of classes, like a set, no? – Nassim Ben Mar 03 '17 at 19:59
  • 1
    It returns a list of classes for each image. For instance, if we have three images, the first two are from class 1 and the last one is from class 0, `train_generator.classes` equals `[1, 1, 0]`. – Fábio Perez Mar 03 '17 at 20:04
  • 1
    Indeed, just went to see the source code :) Good job – Nassim Ben Mar 03 '17 at 20:12
  • 1
    Hey, thanks for this. Can you elaborate what you mean by "these weights may not be good for convergence"? – arao6 Dec 28 '19 at 00:32
  • But how to do this in numpy, pythor or tensorflow (i.e. when image label are numpy array)? – Jaja Apr 08 '21 at 10:35
17

Alternatively, you can simply do:

from sklearn.utils import class_weight
import numpy as np

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(train_generator.classes), 
                train_generator.classes)

You can then set (as per comment above):

model.fit_generator(..., class_weight=class_weights)
Pasha Dembo
  • 281
  • 3
  • 3
1

I tried both solutions and the sklearn.utils.class_weight one gives better accuracy though I am not sure why. They do not both yield the same class weights.

David Brown
  • 133
  • 1
  • 10
1

As suggested in the article here, a good way to assign class weights is to use:

(1 / class_count) * (total_count/2)

Thus, slightly modifying the method suggested above by Fábio Perez:

counter = Counter(train_generator.classes)
total = float(sum(counter.values()))
class_weight = {class_id : (1/num_images)*(total)/2.0 for class_id, num_images in counter.items()}
Aman Agrawal
  • 53
  • 1
  • 1
  • 4
1

The code suggested by Pasha Dembo works pretty well. However, you should transform it in a dictionary before inserting in the model_fit generator:

from sklearn.utils import class_weight import numpy as np

class_weights = class_weight.compute_class_weight(
           'balanced',
            np.unique(train_generator.classes), 
            train_generator.classes)

train_class_weights = dict(enumerate(class_weights))
model.fit_generator(..., class_weight=train_class_weights)

Alternatively, you can simply do:

 from sklearn.utils import class_weight import numpy as np
 
 class_weights = class_weight.compute_class_weight(
                'balanced',
                 np.unique(train_generator.classes), 
                 train_generator.classes) You can then set (as per comment above):
 
 model.fit_generator(..., class_weight=class_weights)
DCCoder
  • 1,587
  • 4
  • 16
  • 29
Taisa
  • 123
  • 9
1
from sklearn.utils import class_weight
import numpy as np
class_weights = dict(zip(np.unique(traingen.classes),class_weight.compute_class_weight(
                        class_weight = 'balanced',
                        classes = np.unique(traingen.classes), 
                        y = traingen.classes)))
Soheil
  • 31
  • 3
  • Your answer could be improved by adding more information on what the code does and how it helps the OP. – Tyler2P May 02 '22 at 07:31
0

April 2023 version. Ended up using this:

from sklearn.utils.class_weight import compute_class_weight

unique_classes = np.unique(ds_train.classes)
# "If ‘balanced’, class weights will be given by n_samples / (n_classes * np.bincount(y))."
class_weights = compute_class_weight("balanced", classes=unique_classes, y=ds_train.classes)
class_weight = {class_id: weight for class_id, weight in zip(unique_classes, class_weights)}

model.fit(..., class_weight=class_weight)
Spherical Cowboy
  • 565
  • 6
  • 14