I am comparing models for the detection of objects for maritime Search and Rescue (SAR) purposes. From the models that I used, I got the best results for the improved version of YOLOv3 for small object detection and for FASTER RCNN.
For YOLOv3 I got the best mAP@50, but for FASTER RCNN I got better all other metrics (precision, recall, F1 score). Now I am wondering how to read it and which model is really better in this case?
I would like to add that there are only two classes in the dataset: small and large objects. We chose this solution because the objects' distinction between classes is not as important to us as the detection of any human origin object.
However, small objects don't mean small GT bounding boxes. These are objects that actually have a small area - less than 2 square meters (e.g. people, buoys). Large objects are objects with a larger area (boats, ships, canoes, etc.).
Here are the results per category:
And two sample images from the dataset (with YOLOv3 detections):