Is it possible to automatically infer the class_weight from flow_from_directory in Keras?

Question

I have an imbalanced multi-class dataset and I want to use the class_weight argument from fit_generator to give weights to the classes according to the number of images of each class. I'm using ImageDataGenerator.flow_from_directory to load the dataset from a directory.

Is it possible to directly infer the class_weight argument from the ImageDataGenerator object?

I don't think this is possible. Why can't you just compute it once? — Nassim Ben, Mar 03 '17 at 19:34

Fábio Perez · Accepted Answer · 2017-03-08T10:55:17.377

40

Just figured out a way of achieving this.

from collections import Counter
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(...)

counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}                     

model.fit_generator(...,
                    class_weight=class_weights)

train_generator.classes is a list of classes for each image. Counter(train_generator.classes) creates a counter of the number of images in each class.

Note that these weights may not be good for convergence, but you can use it as a base for other type of weighting based on occurrence.

This answer was inspired by: https://github.com/fchollet/keras/issues/1875#issuecomment-273752868

edited Mar 08 '17 at 10:55

answered Mar 03 '17 at 19:43

Fábio Perez

23,850
22
76
100

but train_generator.classes only returns a list of classes, like a set, no? – Nassim Ben Mar 03 '17 at 19:59
1

It returns a list of classes for each image. For instance, if we have three images, the first two are from class 1 and the last one is from class 0, `train_generator.classes` equals `[1, 1, 0]`. – Fábio Perez Mar 03 '17 at 20:04
1

Indeed, just went to see the source code :) Good job – Nassim Ben Mar 03 '17 at 20:12
1

Hey, thanks for this. Can you elaborate what you mean by "these weights may not be good for convergence"? – arao6 Dec 28 '19 at 00:32
But how to do this in numpy, pythor or tensorflow (i.e. when image label are numpy array)? – Jaja Apr 08 '21 at 10:35

score 17 · Answer 2 · answered Jul 23 '18 at 16:09

Alternatively, you can simply do:

from sklearn.utils import class_weight
import numpy as np

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(train_generator.classes), 
                train_generator.classes)

You can then set (as per comment above):

model.fit_generator(..., class_weight=class_weights)

score 1 · Answer 3 · answered Sep 14 '18 at 19:51

1

I tried both solutions and the sklearn.utils.class_weight one gives better accuracy though I am not sure why. They do not both yield the same class weights.

answered Sep 14 '18 at 19:51

David Brown

133
1
10

If you look at the ratio of weights of classes in each case, it is the same. – Gautam J Oct 03 '19 at 14:59
How much of a difference did you get? @David Brown – Anshuman Kumar May 27 '20 at 08:40

score 1 · Answer 4 · answered Jan 13 '21 at 06:23

As suggested in the article here, a good way to assign class weights is to use:

(1 / class_count) * (total_count/2)

Thus, slightly modifying the method suggested above by Fábio Perez:

counter = Counter(train_generator.classes)
total = float(sum(counter.values()))
class_weight = {class_id : (1/num_images)*(total)/2.0 for class_id, num_images in counter.items()}

score 1 · Answer 5 · edited May 24 '21 at 20:30

The code suggested by Pasha Dembo works pretty well. However, you should transform it in a dictionary before inserting in the model_fit generator:

from sklearn.utils import class_weight import numpy as np

class_weights = class_weight.compute_class_weight(
           'balanced',
            np.unique(train_generator.classes), 
            train_generator.classes)

train_class_weights = dict(enumerate(class_weights))
model.fit_generator(..., class_weight=train_class_weights)

Alternatively, you can simply do:

 from sklearn.utils import class_weight import numpy as np
 
 class_weights = class_weight.compute_class_weight(
                'balanced',
                 np.unique(train_generator.classes), 
                 train_generator.classes) You can then set (as per comment above):
 
 model.fit_generator(..., class_weight=class_weights)

score 1 · Answer 6 · answered Apr 29 '22 at 16:44

1

from sklearn.utils import class_weight
import numpy as np
class_weights = dict(zip(np.unique(traingen.classes),class_weight.compute_class_weight(
                        class_weight = 'balanced',
                        classes = np.unique(traingen.classes), 
                        y = traingen.classes)))

answered Apr 29 '22 at 16:44

Soheil

31
3

Your answer could be improved by adding more information on what the code does and how it helps the OP. – Tyler2P May 02 '22 at 07:31

score 0 · Answer 7 · answered Apr 17 '23 at 09:36

April 2023 version. Ended up using this:

from sklearn.utils.class_weight import compute_class_weight

unique_classes = np.unique(ds_train.classes)
# "If ‘balanced’, class weights will be given by n_samples / (n_classes * np.bincount(y))."
class_weights = compute_class_weight("balanced", classes=unique_classes, y=ds_train.classes)
class_weight = {class_id: weight for class_id, weight in zip(unique_classes, class_weights)}

model.fit(..., class_weight=class_weight)

Is it possible to automatically infer the class_weight from flow_from_directory in Keras?

7 Answers7

Linked