How to train a neural network with a variable output size

Question

I have a working CNN-LSTM model trying to predict keypoints of human bodyparts on videos.

Currently, I have four keypoints as labels right hand, left hand, head and pelvis. The problem is that on some frames I can't see the four parts of the human that I want to label, so by default I set those values to (0,0) (which is a null coordinate).

The problem that I faced was the model taking in account those points and trying to regress on them while being in a sequence.

Thus, I removed the (0,0) points in the loss calculation and the gradient retropropagation and it works much better.

The problem is that the Four points are still predicted, so I am trying to know by any means how to make it predict a variable number of keypoints.

I thought of adding a third parameter (is it visible ?), but it will probably add some complexity and loose the model.

IMO having an extra binary output to predict the presence/absence of the point is the way to go — leleogere, Aug 23 '22 at 12:31

score 0 · Answer 1 · answered Aug 23 '22 at 13:03

0

I think that you'll have to write a custom loss function that computes the loss between points only when the target coordinates are not null.

See PyTorch custom loss function on writing custom losses.

Something like:

def loss(outputs, labels):
    err = 0
    n = 0
    for xo, xt in zip(outputs, labels):
        if xt.values == torch.zeros(2):  # null coord
            continue
        err += torch.nn.functional.mse_loss(xo, xt)
        n += 1
    return (err / n)

This is pseudo-code only! An alternative form which will avoid the loop is to have an explicit binary vector (as suggested by @leleogere) that you can then multiply by the loss on each coordinate before reducing.

answered Aug 23 '22 at 13:03

ATony

683
2
12

That is already what I have made. This doesn't tell if the point is visible or not – Mrofsnart Aug 23 '22 at 14:19
You didn't provide example code so I was going off your description and, from what you wrote, it seemed that the main problem was the learning which a custom loss will fix. You need it to predict if the point is visible then you will need separate binary outputs to tell you that, and to add that into the loss. – ATony Aug 23 '22 at 19:46
I am sorry if my message was unclear in someway, but I already managed to ignore the (0,0) points in my loss (and thus, in retropro). The issue with this technique is that you don't really know if a point predicted by the model is supposed to be visible or not. – Mrofsnart Aug 24 '22 at 07:56

How to train a neural network with a variable output size

1 Answers1