Predictions stuck at zero when positive label (1) is only 16% of data

Question

So, I run the same code with a 50/50 split of 0 and 1 label, I get aboyt 70% accuracy on val set and my val preds are not stuck at 0.

However, when I run the code on a dataset with 84/16 % split of labels 0 and 1, all my val preds end up being 0. I used both cross-entropy loss as well as BCEWithLogitLoss with a weight vector (though I am not sure if I set the weight vector correctly). Also, is weight = torch.tensor([0.84, 0.16]) correct?

How can I fix this problem?

        loss_type = 'BCEWithLogitsLoss'
        if loss_type == 'BCEWithLogitsLoss':
            self.criterion = nn.BCEWithLogitsLoss(reduction='none') # weighted loss for imbalanced dataset
            #self.criterion = nn.BCEWithLogitsLoss() 
        elif loss_type == 'CrossEntropyLoss':
            self.criterion = nn.CrossEntropyLoss() # this should work for binary classification


if loss_type == 'BCEWithLogitsLoss':
            labels = torch.as_tensor(labels, dtype=torch.float32) # we need float labels for BCEWithLogitsLoss
            weight = torch.tensor([0.84, 0.16]) # how to decide on this weights?
            #weight = torch.tensor([0.5, 0.5])
            weight_ = weight[labels.data.view(-1).long()].view_as(labels)
            m = nn.Sigmoid()
            with torch.cuda.amp.autocast():
                loss = self.criterion(m(out[:,1]-out[:,0]), labels.cuda())    
                loss_class_weighted = loss * weight_.cuda()
                loss_class_weighted = loss_class_weighted.mean()
                loss = loss_class_weighted
        elif loss_type == 'CrossEntropyLoss':
            labels = torch.as_tensor(labels)
            with torch.cuda.amp.autocast():   
                loss = self.criterion(out, labels.cuda())
                print('loss: ', loss)
       
        pred_labels = out.data.max(1)[1]
        #pred_labels = out.argmax(dim=1)
        labels = labels.int()       
        return pred_labels, labels, loss

Output for imbalanced data is:

train_epoch_accuracy:  0.8515625
not test
Evaluating...
epoch is:  49
evaluating...
epoch val acc:  tensor(0.8541, device='cuda:0')
val_epoch_accuracy:  0.8426966292134831
best val acc:  tensor(0.8541, device='cuda:0')
best epoch:  1
best preds:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
best val labels:  [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

Here's also the output for balanced dataset:

train_epoch_accuracy:  0.9756944444444444
not test
Evaluating...
epoch is:  49
evaluating...
epoch val acc:  tensor(0.4453, device='cuda:0')
val_epoch_accuracy:  0.5876288659793815
best val acc:  tensor(0.7422, device='cuda:0')
best epoch:  28
best preds:  [1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0]
best val labels:  [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0]

see also [this answer](https://stackoverflow.com/a/58213245/1714410) and [this](https://stackoverflow.com/a/65766155/1714410). — Shai, Feb 28 '22 at 16:47
@Shai so I am curious why weighted BCEwithLogitLoss that is introduced for imbalanced data doesn't work? I don't think my question should be closed honestly! — Mona Jalal, Feb 28 '22 at 16:48
This is a bit complicated to explain in a comment or post - but the focal loss paper explains why weighting the CE does not do the trick here. — Shai, Feb 28 '22 at 17:30
Basically, focal-loss and hard negative mining affect the _gradients_ of the loss, while the weighting of the CE affect the _value_ of the loss, more than the gradient. — Shai, Feb 28 '22 at 17:49

Predictions stuck at zero when positive label (1) is only 16% of data

0 Answers0