20

I have a 1000 classes in the network and they have multi-label outputs. For each training example, the number of positive output is same(i.e 10) but they can be assigned to any of the 1000 classes. So 10 classes have output 1 and rest 990 have output 0.

For the multi-label classification, I am using 'binary-cross entropy' as cost function and 'sigmoid' as the activation function. When I tried this rule of 0.5 as the cut-off for 1 or 0. All of them were 0. I understand this is a class imbalance problem. From this link, I understand that, I might have to create extra output labels.Unfortunately, I haven't been able to figure out how to incorporate that into a simple neural network in keras.

nclasses = 1000

# if we wanted to maximize an imbalance problem!
#class_weight = {k: len(Y_train)/(nclasses*(Y_train==k).sum()) for k in range(nclasses)}


inp = Input(shape=[X_train.shape[1]])
x = Dense(5000, activation='relu')(inp)

x = Dense(4000, activation='relu')(x)

x = Dense(3000, activation='relu')(x)
x = Dense(2000, activation='relu')(x)
x = Dense(nclasses, activation='sigmoid')(x)
model = Model(inputs=[inp], outputs=[x])

adam=keras.optimizers.adam(lr=0.00001)
model.compile('adam', 'binary_crossentropy')
history = model.fit(
    X_train, Y_train, batch_size=32, epochs=50,verbose=0,shuffle=False)

Could anyone help me with the code here and I would also highly appreciate if you could suggest a good 'accuracy' metric for this problem?

Thanks a lot :) :)

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Mahmud Sabbir
  • 371
  • 1
  • 2
  • 12

1 Answers1

28

I have a similar problem and unfortunately have no answer for most of the questions. Especially the class imbalance problem.

In terms of metric there are several possibilities: In my case I use the top 1/2/3/4/5 results and check if one of them is right. Because in your case you always have the same amount of labels=1 you could take your top 10 results and see how many percent of them are right and average this result over your batch size. I didn't find a possibility to include this algorithm as a keras metric. Instead, I wrote a callback, which calculates the metric on epoch end on my validation data set.

Also, if you predict the top n results on a test dataset, see how many times each class is predicted. The Counter Class is really convenient for this purpose.

Edit: If found a method to include class weights without splitting the output. You need a numpy 2d array containing weights with shape [number classes to predict, 2 (background and signal)]. Such an array could be calculated with this function:

def calculating_class_weights(y_true):
    from sklearn.utils.class_weight import compute_class_weight
    number_dim = np.shape(y_true)[1]
    weights = np.empty([number_dim, 2])
    for i in range(number_dim):
        weights[i] = compute_class_weight('balanced', [0.,1.], y_true[:, i])
    return weights

The solution is now to build your own binary crossentropy loss function in which you multiply your weights yourself:

def get_weighted_loss(weights):
    def weighted_loss(y_true, y_pred):
        return K.mean((weights[:,0]**(1-y_true))*(weights[:,1]**(y_true))*K.binary_crossentropy(y_true, y_pred), axis=-1)
    return weighted_loss

weights[:,0] is an array with all the background weights and weights[:,1] contains all the signal weights.

All that is left is to include this loss into the compile function:

model.compile(optimizer=Adam(), loss=get_weighted_loss(class_weights))
Karl
  • 5,573
  • 8
  • 50
  • 73
dennis-w
  • 2,166
  • 1
  • 13
  • 23
  • 6
    I really like this answer! By the way, in case anyone else runs into this problem: if you save a model that was trained using this custom loss function and want to load it again, you'll get an "unknown loss function" error. This can be overcome by setting the "custom_objects" parameter, E.g model = load_model("path/to/model.hd5f", custom_objects={"weighted_loss": get_weighted_loss(weights)} – Karl Jun 06 '18 at 13:19
  • 1
    Can someone please shed some light on how the weighting formula `K.mean((weights[:,0]**(1-y_true))*(weights[:,1]**(y_true))*K.binary_crossentropy(y_true, y_pred), axis=-1)` was constructed? Thanks. – Hendrik Jun 06 '18 at 23:16
  • I explain it here in my last post https://github.com/keras-team/keras/issues/2592#issuecomment-387579022 . Basically everything is a a vector of shape (number_sampples, number_outputs) inside K.mean. New are the weight factors which are constructed that one of them is one and the other is the corresponding weight. – dennis-w Jun 07 '18 at 06:19
  • Just a correction (can't edit comments?). Should be load_model("path/to/model.h5", ...) not .hd5f – Karl Jun 07 '18 at 10:24
  • A small addition: If you save the model (with the addition of custom object) and you load the model for inference only, you don't need to have the function anymore. Just use another keras loss function: custom_objects={"weighted_loss": some_other_loss_function}. It won't matter because you won't use it anyway. Therefore you don't have to copy the code for the loss function in your inference code. – dennis-w Jun 07 '18 at 11:15
  • I have trouble getting it to work with my setup for some reason: my y_true shape is this: `(6790, 23)`. obviously, I have 6790 samples with 23 possible labels each, the 23-dimensional vectors having 1 or 0 independently for each label. Now when I pass my y_true to your function `calculating_class_weight`, I get `ValueError: classes should have valid labels that are in y`. I'm not sure why – axolotl Nov 25 '18 at 20:39
  • just to make sure I'm not using a wrong function for my use case, here's why I'm trying to use it: I have 23 labels in a multi-label classification problem. Each sample has 1 or more labels. However, the number of samples for each label is not balanced, so my model is basically guessing the most dominant labels for all of them. this is why I wanted to balance them. am I in the right place looking for this function? – axolotl Nov 25 '18 at 20:42
  • I think so, this function is designed that the network thinks, every label is equally true and false. Also, keep in mind if you want to d EarlyStopping do this on a none weighted metric. – dennis-w Nov 26 '18 at 08:49
  • Does anyone know how to change the code so that it would work with keras 2.0 – Tuong Nguyen Minh Sep 02 '20 at 12:43
  • to get rid of the `classes should have valid labels` I have replaced `compute_class_weight('balanced', [0.,1.], y_true[:, i])` with `compute_class_weight('balanced', np.unique(y_true[:, i]), y_true[:, i])` – Valentin Brasso Mar 10 '22 at 13:33