I have a working CNN-LSTM model trying to predict keypoints of human bodyparts on videos.
Currently, I have four keypoints as labels right hand, left hand, head and pelvis. The problem is that on some frames I can't see the four parts of the human that I want to label, so by default I set those values to (0,0) (which is a null coordinate).
The problem that I faced was the model taking in account those points and trying to regress on them while being in a sequence.
Thus, I removed the (0,0) points in the loss calculation and the gradient retropropagation and it works much better.
The problem is that the Four points are still predicted, so I am trying to know by any means how to make it predict a variable number of keypoints.
I thought of adding a third parameter (is it visible ?), but it will probably add some complexity and loose the model.