How to create class weighting for multi-label classificication?

Question

I have the following problem that I am working on:

-I have to create a CNN which takes as input a 3D image and outputs 4 classes(details below) -All 4 labels must be either 0 or 1: True or False depending on the input image

e.g. of an output: [0, 1, 0, 1] : this means that my prediction is that classes 2 and 4 are good for that image(the application is not relevant).

Thus being said, I have a tensor of labels of the form [X,4] where X is the number of samples(or images).

The problem I am facing right now is a huge class imbalance(e.g. for the 3rd class almost 98% of the cases are 1s and only 2% are 0s). I have no idea how to solve this issue? I tried to google it for some good hours but no answer at all. I used class weighting(from sklearn) before but it seems that I cannot use it this time too.

The problem I observed using the class weighting is that it will weight each array of the input(i.e. 'what is the weighting of [0,1,1,0] in the entire label matrix') which is obviously not desirable. I want for each class to have one weight for 0s and one weight for 1s(which sums up to 8 weights).

I've seen someone who tried to do this before and I manually created a function which calculate the weights and outputs the probability of either 0 or 1 for each class(e.g. class1 weight0 and class1 weight1).

Following on, I must create a dictionary of the weights. E.g. for a single-label classification: {0: 0.9210526315789473, 1: 1.09375}. I need this to be an argument in my model.fit() function.

Obviously, I cannot create a dictionary which takes 4 different keys of 0 and 4 keys of 1. What should I do from here??

My first idea was to change the numbers in the labels in the following way: 1st class : 0=False; 1=True 2nd class : 2=False; 3=True 3rd class : 4=False; 5=True 4th class : 6=False; 7=True

Basically I just added up some multiples of 2 for each label and now each row of my label matrix has elements between 0 and 7.

I was able to create the dictionary in the form of {0:w0;1:w1;2:w2,3:w3...} which seemed like a good idea to me.

Than I faced one more issue: when I fitted my model, the predictions were in range (0,1) because I was using a sigmoid activation function on the last neuron(i.e. Dense(4,activation='sigmoid')). I have never worked before with numbers which are not between 0 and 1 but it kind of make sense for me to change the activation function from sigmoid to linear.

My dictonary of weights at this point looks like this:
{0: 0.8714285714285714,
 1: 0.12857142857142856,
 2: 0.5428571428571428,
 3: 0.45714285714285713,
 4: 0.02857142857142857,
 5: 0.9714285714285714,
 6: 0.8142857142857143,
 7: 0.18571428571428572}
where again, e.g. 6: represents the weight of the 4th class to be 0 or 1: represents the weight of the 1st class to be 1 and so forth.

With all of this done, my model is still acting weird. The outputs are not quite what is expected (e.g. a value between 0 and 1 for the 1st class, a value between 2 and 3 for the 2nd class and so forth). The accuracy is not stable, it varies a lot and the validation accuracy just jumps between 0 and 1?

This is how an output looks like right now:
array([[ 0.2878278,  1.3507844, -1.563219 ,  0.5500042]]
Which, obviously, is totally wrong.

I will attach the code with the model and the function that I am using to compute the weights(I know it is nested and not using any vectorisation, but it was designed just for testing purposes only).

I really hope anyone can help me in diagnosing this problem, either being able to predict the rights values for each class or compute weights in a different manner.

CNN:

from tensorflow.keras import datasets, layers, models
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.initializers import RandomNormal
from tensorflow.keras.regularizers import l2

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)

model=models.Sequential();

model.add(layers.Conv3D(16, (2,2,2) , kernel_regularizer=l2(0.01), strides= (1,1,1),input_shape=images['06S'].shape))
model.add(layers.MaxPooling3D(pool_size=(2,2,2),strides=(1,1,1))) 
model.add(BatchNormalization(epsilon=1e-01,momentum=0.65))
model.add(tf.keras.layers.LeakyReLU(alpha=0.8))
model.add(layers.Dropout(0.7))

model.add(layers.Conv3D(8, (2,2,2) , kernel_regularizer=l2(0.01), strides=(1,1,1)))
model.add(layers.MaxPooling3D(pool_size=(2,2,2),strides=(1,1,1))) 
model.add(BatchNormalization(epsilon=1e-01,momentum=0.65))
model.add(tf.keras.layers.LeakyReLU(alpha=0.8))
model.add(layers.Dropout(0.7))

model.add(layers.Conv3D(4, (2,2,2) , kernel_regularizer=l2(0.01), strides=(1,1,1)))
model.add(layers.MaxPooling3D(pool_size=(2,2,2),strides=(1,1,1))) 
model.add(BatchNormalization(epsilon=1e-01,momentum=0.65))
model.add(tf.keras.layers.LeakyReLU(alpha=0.8))
model.add(layers.Dropout(0.7))


model.add(layers.Conv3D(16, (3,3,3) , kernel_regularizer=l2(0.01),strides=(1,1,1)))
model.add(layers.MaxPooling3D(pool_size=(3,3,3),strides=(1,1,1))) 
model.add(BatchNormalization(epsilon=1e-01,momentum=0.65))
model.add(tf.keras.layers.LeakyReLU(alpha=0.8))
model.add(layers.Dropout(0.7))


model.add(layers.Dense(32,activation=None))
model.add(BatchNormalization(epsilon=1e-04,momentum=0.1))
model.add(tf.keras.layers.LeakyReLU(alpha=0.4))
model.add(layers.Dropout(0.6))


model.add(layers.Dense(16,activation=None))
model.add(BatchNormalization(epsilon=1e-04,momentum=0.1))
model.add(tf.keras.layers.LeakyReLU(alpha=0.4))
model.add(layers.Dropout(0.6))


model.add(layers.Dense(4, activation='linear'))


model.summary()

model.compile(optimizer='adam',
              loss='mse',
              metrics=['accuracy'])
Compute weights:

def class_weighting(arr):
    arr_np=np.array(arr)
    
    for j in range (arr_np.shape[0]):
        ones=0
        zeros=0
        for i in range (arr_np.shape[1]):
            if(j==0):
                if (arr[j][i] == 1):
                    ones+=1
                else:
                    zeros+=1
                PVI0=zeros/arr_np.shape[1];
                PVI1=ones/arr_np.shape[1];
            elif(j==1):
                if (arr[j][i] == 1):
                    ones+=1
                else:
                    zeros+=1
                FIBRO0=zeros/arr_np.shape[1];
                FIBRO1=ones/arr_np.shape[1];
            elif(j==2):
                if (arr[j][i] == 1):
                    ones+=1
                else:
                    zeros+=1
                ROTOR0=zeros/arr_np.shape[1];
                ROTOR1=ones/arr_np.shape[1];
            elif(j==3):
                if (arr[j][i] == 1):
                    ones+=1
                else:
                    zeros+=1
                ROOF0=zeros/arr_np.shape[1];
                ROOF1=ones/arr_np.shape[1]; 
    return PVI0,PVI1,FIBRO0,FIBRO1,ROTOR0,ROTOR1,ROOF0,ROOF1

 Fitting:

PVI0,PVI1,FIBRO0,FIBRO1,ROTOR0,ROTOR1,ROOF0,ROOF1=class_weighting(arr)
classWeight={0:(PVI0),1:(PVI1),2:(FIBRO0),3:(FIBRO1),4:(ROTOR0),5:(ROTOR1),6:(ROOF0),
7:(ROOF1)}
history=model.fit(train_dataset,epochs=10,validation_data=val_dataset,
class_weight=classWeight))

Gonna track this one, I had a nearly identical issue and never found the perfect solution. — whege, Nov 03 '21 at 20:22

score 0 · Answer 1 · answered Nov 04 '21 at 01:47

You should adjust your loss function to account for the loss weights. Here is the code, and I have some notes below:

from tensorflow import keras
import tensorflow as tf
import numpy as np

loss_scale_dic = {0: 0.8714285714285714,
 1: 0.12857142857142856,
 2: 0.5428571428571428,
 3: 0.45714285714285713,
 4: 0.02857142857142857,
 5: 0.9714285714285714,
 6: 0.8142857142857143,
 7: 0.18571428571428572}

num_class = 4

#convert the loss scale dic to an indexable array
loss_scale = np.array([[loss_scale_dic[i*2+j] for j in range(2)] for i in range(num_class)])

class WeightedMSE(keras.losses.Loss):
    def __init__(self, weights,
                 reduction=keras.losses.Reduction.AUTO,
                 name='weighted_MSE'):
        super().__init__(reduction=reduction, name=name)
        self.weights = weights

    def call(self, y_true, y_pred):
        se = (y_pred - y_true)**2
        weights = self.weights[np.arange(len(self.weights)),y_true] #this scales the SE loss
        return tf.math.reduce_mean(se*weights,1)

N=5
np.random.seed(1)
y_true = np.zeros([N,num_class],np.int32)
y_true[np.arange(0,N),np.random.randint(0,4,N)] = 1
y_pred = np.random.uniform(0,1,[N,num_class])

loss = WeightedMSE(loss_scale,reduction='none') #scaling
scaled_losses = loss(y_true,y_pred)


loss = WeightedMSE(np.ones_like(loss_scale),reduction='none') #no scaling, standard MSE
losses = loss(y_true,y_pred)

print(losses,'\n',scaled_losses)


Out: tf.Tensor([0.05735777 0.37447205 0.24902505 0.36165863 0.50984228], shape=(5,), dtype=float64) 
tf.Tensor([0.0359915  0.10100629 0.06463426 0.15958925 0.27983722], shape=(5,), dtype=float64)

My first note is that you can use imbalanced classes with NN without scaling the losses. You might need to adjust your thresholds for each class (so not a one-to-one comparison with the dominant classes) but there is some output-based finetuning you can do without adjusting your model.

Secondly, you are scaling MSE, when I think it'd make more sense to do scaling of the CE loss: see here.

Thirdly, when using the inbuilt Crossentropy loss, you just need to provide the positive weights in class weight because otherwise you are penalising the weights twice. You want to promote the rare positive class, which has the effect already of reducing the common classes when the rare class is positive.

Which would give you this version:

loss_scale_dic = {0: 0.8714285714285714,
 1: 0.12857142857142856,
 2: 0.5428571428571428,
 3: 0.45714285714285713,
 4: 0.02857142857142857,
 5: 0.9714285714285714,
 6: 0.8142857142857143,
 7: 0.18571428571428572}

num_class = 4
model.compile(optimizer='adam',
          loss='categorical_crossentropy',
          metrics=['accuracy'])
#convert the loss scale dic to an indexable array

loss_scale = np.array([loss_scale_dic[i*2+1] for i in range(num_class)])
history=model.fit(train_dataset,epochs=10,validation_data=val_dataset,class_weight=loss_scale))

How to create class weighting for multi-label classificication?

1 Answers1