I am using SSD512(imagenet pre-trained model) and Faster_R-CNN(pre-trained) while training, the loss and confidence displays nan and validation as 0.
[Basketball-ChainerCV] (https://github.com/atom2k17/Basketball-ChainerCV/blob/master/basketballproject.py).
This is the image for SSD300 training below:
When training Faster R-CNN before training starts the following is displayed before the result of the first set of epochs:
/usr/local/lib/python3.6/dist-
packages/chainercv/links/model/faster_rcnn/utils/loc2bbox.py:65:
RuntimeWarning: overflow encountered in exp
h = xp.exp(dh) * src_height[:, xp.newaxis]
/usr/local/lib/python3.6/dist-
packages/chainercv/links/model/faster_rcnn/utils/loc2bbox.py:65:
RuntimeWarning: overflow encountered in multiply
h = xp.exp(dh) * src_height[:, xp.newaxis]
/usr/local/lib/python3.6/dist-
packages/chainercv/links/model/faster_rcnn/utils/loc2bbox.py:66:
RuntimeWarning: overflow encountered in exp
w = xp.exp(dw) * src_width[:, xp.newaxis]
/usr/local/lib/python3.6/dist-
packages/chainercv/links/model/faster_rcnn/utils/loc2bbox.py:66:
RuntimeWarning: overflow encountered in multiply
w = xp.exp(dw) * src_width[:, xp.newaxis]
/usr/local/lib/python3.6/dist-
packages/chainercv/links/model/faster_rcnn/utils/proposal_creator.py:126:
RuntimeWarning: invalid value encountered in greater_equal
Things I have tried:
- Increasing learning rate
- Decreasing batch_size
- Removed images, annotations and contents in text files that have images where the bounding box is less than 1% of the total image size
Note: Everything works perfectly fine with SSD300, the issues are with SSD512 and Faster RCNN models. All the models are pre-trained on ImageNet dataset.
What are the issue/issues behind the problem? Can anyone give pointers on how to deal with such issues?