UNet Model accuracy is stuck at exact 0.5 (neither more or less) (No class imbalance, tried tuning learning rate)

Question

This is using PyTorch

I have been trying to implement UNet model on my images, however, my model accuracy is always exact 0.5. Loss does decrease.

I have also checked for class imbalance. I have also tried playing with learning rate. Learning rate affects loss but not the accuracy.

My architecture below ( from here )

""" `UNet` class is based on https://arxiv.org/abs/1505.04597

The U-Net is a convolutional encoder-decoder neural network.
Contextual spatial information (from the decoding,
expansive pathway) about an input tensor is merged with
information representing the localization of details
(from the encoding, compressive pathway).

Modifications to the original paper:
(1) padding is used in 3x3 convolutions to prevent loss
    of border pixels
(2) merging outputs does not require cropping due to (1)
(3) residual connections can be used by specifying
    UNet(merge_mode='add')
(4) if non-parametric upsampling is used in the decoder
    pathway (specified by upmode='upsample'), then an
    additional 1x1 2d convolution occurs after upsampling
    to reduce channel dimensionality by a factor of 2.
    This channel halving happens with the convolution in
    the tranpose convolution (specified by upmode='transpose')


    Arguments:
        in_channels: int, number of channels in the input tensor.
                     Default is 3 for RGB images. Our SPARCS dataset is 13 channel.
              depth: int, number of MaxPools in the U-Net. During training, input size needs to be 
                     (depth-1) times divisible by 2
        start_filts: int, number of convolutional filters for the first conv.
            up_mode: string, type of upconvolution. Choices: 'transpose' for transpose convolution 

"""

class UNet(nn.Module):

    def __init__(self, num_classes, depth, in_channels, start_filts=16, up_mode='transpose', merge_mode='concat'):

        super(UNet, self).__init__()

        if up_mode in ('transpose', 'upsample'):
            self.up_mode = up_mode
        else:
            raise ValueError("\"{}\" is not a valid mode for upsampling. Only \"transpose\" and \"upsample\" are allowed.".format(up_mode))

        if merge_mode in ('concat', 'add'):
            self.merge_mode = merge_mode
        else:
            raise ValueError("\"{}\" is not a valid mode for merging up and down paths.Only \"concat\" and \"add\" are allowed.".format(up_mode))

        # NOTE: up_mode 'upsample' is incompatible with merge_mode 'add'
        if self.up_mode == 'upsample' and self.merge_mode == 'add':
            raise ValueError("up_mode \"upsample\" is incompatible with merge_mode \"add\" at the moment "
                             "because it doesn't make sense to use nearest neighbour to reduce depth channels (by half).")

        self.num_classes = num_classes
        self.in_channels = in_channels
        self.start_filts = start_filts
        self.depth = depth

        self.down_convs = []
        self.up_convs = []

        # create the encoder pathway and add to a list
        for i in range(depth):
            ins = self.in_channels if i == 0 else outs
            outs = self.start_filts*(2**i)
            pooling = True if i < depth-1 else False

            down_conv = DownConv(ins, outs, pooling=pooling)
            self.down_convs.append(down_conv)

        # create the decoder pathway and add to a list
        # - careful! decoding only requires depth-1 blocks
        for i in range(depth-1):
            ins = outs
            outs = ins // 2
            up_conv = UpConv(ins, outs, up_mode=up_mode, merge_mode=merge_mode)
            self.up_convs.append(up_conv)


        self.conv_final = conv1x1(outs, self.num_classes)

        # add the list of modules to current module
        self.down_convs = nn.ModuleList(self.down_convs)
        self.up_convs = nn.ModuleList(self.up_convs)

        self.reset_params()

    @staticmethod
    def weight_init(m):
        if isinstance(m, nn.Conv2d):

            #https://prateekvjoshi.com/2016/03/29/understanding-xavier-initialization-in-deep-neural-networks/ 
            ##Doc: https://pytorch.org/docs/stable/nn.init.html?highlight=xavier#torch.nn.init.xavier_normal_ 
            init.xavier_normal_(m.weight)
            init.constant_(m.bias, 0)



    def reset_params(self):
        for i, m in enumerate(self.modules()):
            self.weight_init(m)


    def forward(self, x):
        encoder_outs = []

        # encoder pathway, save outputs for merging
        for i, module in enumerate(self.down_convs):
            x, before_pool = module(x)
            encoder_outs.append(before_pool)

        for i, module in enumerate(self.up_convs):
            before_pool = encoder_outs[-(i+2)]
            x = module(before_pool, x)

        # No softmax is used. This means we need to use
        # nn.CrossEntropyLoss is your training script,
        # as this module includes a softmax already.
        x = self.conv_final(x)
        return x

Parameters are :

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x,y = train_sequence[0] ; batch_size = x.shape[0]
model = UNet(num_classes = 2, depth=5, in_channels=5, merge_mode='concat').to(device)
optim = torch.optim.Adam(model.parameters(),lr=0.01, weight_decay=1e-3)
criterion = nn.BCEWithLogitsLoss() #has sigmoid internally
epochs = 1000

The function for training is :

import torch.nn.functional as f


def train_model(epoch,train_sequence):
    """Train the model and report validation error with training error
    Args:
        model: the model to be trained
        criterion: loss function
        data_train (DataLoader): training dataset
    """
    model.train()
    for idx in range(len(train_sequence)):        
        X, y = train_sequence[idx]             
        images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
        masks = Variable(torch.from_numpy(y)).to(device) 

        outputs = model(images)
        print(masks.shape, outputs.shape)
        loss = criterion(outputs, masks)
        optim.zero_grad()
        loss.backward()
        # Update weights
        optim.step()
    # total_loss = get_loss_train(model, data_train, criterion)

My function for calculating loss and accuracy is below:

def get_loss_train(model, train_sequence):
    """
        Calculate loss over train set
    """
    model.eval()
    total_acc = 0
    total_loss = 0
    for idx in range(len(train_sequence)):        
        with torch.no_grad():
            X, y = train_sequence[idx]             
            images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
            masks = Variable(torch.from_numpy(y)).to(device) 

            outputs = model(images)
            loss = criterion(outputs, masks)
            preds = torch.argmax(outputs, dim=1).float()
            acc = accuracy_check_for_batch(masks.cpu(), preds.cpu(), images.size()[0])
            total_acc = total_acc + acc
            total_loss = total_loss + loss.cpu().item()
    return total_acc/(len(train_sequence)), total_loss/(len(train_sequence))

Edit : Code which runs (calls) the functions:

for epoch in range(epochs):
    train_model(epoch, train_sequence)
    train_acc, train_loss = get_loss_train(model,train_sequence)
    print("Train Acc:", train_acc)
    print("Train loss:", train_loss)

Can someone help me identify as why is accuracy always exact 0.5?

Did you have a look a the actual training loss (not the accuracy)? Ideally, this should consistently decrease over training time. If it is the case, likely your calculation is wrong, otherwise it could be a training issue. With the amount of code you posted it is unfortunately hard to tell exactly what is going wrong, please see [mcve]. — dennlinger, Mar 06 '20 at 09:34
The function ```get_loss_train``` does not update the weights, the model is in ```eval``` mode and it is executed within ```nograd```, was this code expected to train your model or only get the stats from your training set? — JoOkuma, Mar 06 '20 at 12:58
@dennlinger I checked the loss. Loss is decreasing well so it doesn't appear to be the issue there. I edited the code to include the function calls. It would be helpful if you can give me more insights. — Sulphur, Mar 06 '20 at 18:11
@JoOkuma Thank you for the reply. I have updated the code in include `train` function as well the functions which calls both `train` and `train_loss`. Could you please see if the edits make sense. Thank you so much again! — Sulphur, Mar 06 '20 at 18:16
The ```zero_grad``` is at the wrong position, the gradient are computed in the forward pass (```model(images)```), therefore, you're deleting the gradient right before the back propagation. The ```zero_grad``` should be before the forward pass, not between it and the backward call. — JoOkuma, Mar 06 '20 at 18:20
@JoOkuma That would mean something like this : `optim.zero_grad()` `outputs = model(images)` `loss = criterion(outputs, masks)` `loss.backward()` `optim.step()` Sorry, I'm unable to make a code block in the comments :( — Sulphur, Mar 06 '20 at 18:25
Yes, and one additional tip, if the area of the foreground label is much smaller than the background in most of the images the BCE loss will be heavily biased, you should weight the classes or use another loss function. — JoOkuma, Mar 06 '20 at 18:33
@JoOkuma Thank you for your suggestion on `zero_grad`. However, my training accuracy is still stuck at 0.5. I have tested for 10 epocs right now, loss does deacrease but acc is = `0.5` — Sulphur, Mar 06 '20 at 19:07
Also, the images I am currently using for training have balanced classes. So, in my opinion, that should not be a problem. — Sulphur, Mar 06 '20 at 19:17
@danche Can you please give inputs by any chance? I have seen you have answered similar questions in the past. Thank you! — Sulphur, Mar 07 '20 at 00:25
@oezguensi I saw your answer here [here](https://stackoverflow.com/questions/49390842/cross-entropy-in-pytorch/53773196#53773196). Do you think my loss is creating issue? — Sulphur, Mar 07 '20 at 00:26

UNet Model accuracy is stuck at exact 0.5 (neither more or less) (No class imbalance, tried tuning learning rate)

0 Answers0