What is the fastest Mask R-CNN implementation available

Question

I'm running a Mask R-CNN model on an edge device (with an NVIDIA GTX 1080). I am currently using the Detectron2 Mask R-CNN implementation and I archieve an inference speed of around 5 FPS.

To speed this up I looked at other inference engines and model implementations. For example ONNX, but I'm not able to gain a faster inference speed.

TensorRT looks very promising to me but I did not found a ready "out-of-the-box" implementation for it.

Are there any other mature and fast inference engines or other techniques to speed up the inference?

Do you need Mask R-CNN? You may check YOLOV3 or RetinaNet as those are one stage (no proposal phase), especially YOLO is pretty fast and for similar tasks. You can find some comparisons [here](https://medium.com/@jonathan_hui/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359). — Szymon Maszke, Dec 18 '19 at 20:22

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

3

It's almost impossible to get higher inference speed for Mask R-CNN on GTX 1080. You may check detectron2 by Facebook AI Research.

Otherwise, I'd suggest to use YOLACT - (You Only Look At CoefficienTs), it can achieve real-time instance segmentation.

On the other hand, if you don't need instance segmentation, you can use YOLO, SSD, etc for object detection.

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 19 '19 at 06:12

kHarshit

11,362
10
52
71

Hey, thank you for your answer. I already looked at YOLACT and I find its architecture very cool (The paper is also very readable as well). The problem is that YOLACT's performance on our dataset is very bad compared to Mask R-CNN. That is why I have to use Mask R-CNN. And the problem is instance segmentation so YOLO or SSD would not work. – Sharif Elfouly Dec 19 '19 at 07:11

score 2 · Answer 2 · answered Nov 22 '20 at 06:29

2

OpenCV 4.5.0 with DNN_BACKEND_CUDA and DNN_TARGET_CUDA/DNN_TARGET_CUDA_FP16.

Mask RCNN with 1024 x 1024 input image

Device             | FPS
------------------ | -------
GTX 1080 Ti (FP32) | 29
RTX 2080 Ti (FP16) | 60

FPS measured includes NMS but excludes other preprocessing and postprocessing. The network fully runs end-to-end on GPU.

Benchmark code: https://gist.github.com/YashasSamaga/48bdb167303e10f4d07b754888ddbdcf

answered Nov 22 '20 at 06:29

Yashas

1,154
1
12
34

7 FPS on GTX 1050 -- so it will definitely be higher on GTX 1080 – Yashas Nov 22 '20 at 06:37

score 0 · Answer 3 · answered Mar 17 '21 at 14:21

As @kkHarshit already mentioned it is very hard to speed up a Mask R-CNN any further.

The fastest instance segmentation model that I found is YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS).

It's perfomance is worse than Mask R-CNN or Yolact even but still very good.

What is the fastest Mask R-CNN implementation available

3 Answers3