I have a yolo-like network architecture, where on the output layer I want to predict bounding boxes with coordinates such as x,y,width,height. When I use a linear activation function everything works fine, but my model sometimes predicts negative values which dont make sense in my case, as all values to predict are between 0 and 1 for x,y and are 3 or 5 for width and height. I thought I could instead use a ReLU activation for my output but if I do my network gets stuck with NaN as a loss value.
Any ideas to why that could be ?