7

I'm running a Mask R-CNN model on an edge device (with an NVIDIA GTX 1080). I am currently using the Detectron2 Mask R-CNN implementation and I archieve an inference speed of around 5 FPS.

To speed this up I looked at other inference engines and model implementations. For example ONNX, but I'm not able to gain a faster inference speed.

TensorRT looks very promising to me but I did not found a ready "out-of-the-box" implementation for it.

Are there any other mature and fast inference engines or other techniques to speed up the inference?

Sharif Elfouly
  • 508
  • 5
  • 10
  • Do you need Mask R-CNN? You may check YOLOV3 or RetinaNet as those are one stage (no proposal phase), especially YOLO is pretty fast and for similar tasks. You can find some comparisons [here](https://medium.com/@jonathan_hui/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359). – Szymon Maszke Dec 18 '19 at 20:22
  • 1
    I need Instance Segmentation... – Sharif Elfouly Dec 19 '19 at 07:11

3 Answers3

3

It's almost impossible to get higher inference speed for Mask R-CNN on GTX 1080. You may check detectron2 by Facebook AI Research.

Otherwise, I'd suggest to use YOLACT - (You Only Look At CoefficienTs), it can achieve real-time instance segmentation.

enter image description here

On the other hand, if you don't need instance segmentation, you can use YOLO, SSD, etc for object detection.

Community
  • 1
  • 1
kHarshit
  • 11,362
  • 10
  • 52
  • 71
  • Hey, thank you for your answer. I already looked at YOLACT and I find its architecture very cool (The paper is also very readable as well). The problem is that YOLACT's performance on our dataset is very bad compared to Mask R-CNN. That is why I have to use Mask R-CNN. And the problem is instance segmentation so YOLO or SSD would not work. – Sharif Elfouly Dec 19 '19 at 07:11
2

OpenCV 4.5.0 with DNN_BACKEND_CUDA and DNN_TARGET_CUDA/DNN_TARGET_CUDA_FP16.

Mask RCNN with 1024 x 1024 input image

Device             | FPS
------------------ | -------
GTX 1080 Ti (FP32) | 29
RTX 2080 Ti (FP16) | 60

FPS measured includes NMS but excludes other preprocessing and postprocessing. The network fully runs end-to-end on GPU.

Benchmark code: https://gist.github.com/YashasSamaga/48bdb167303e10f4d07b754888ddbdcf

Yashas
  • 1,154
  • 1
  • 12
  • 34
0

As @kkHarshit already mentioned it is very hard to speed up a Mask R-CNN any further.

The fastest instance segmentation model that I found is YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS).

It's perfomance is worse than Mask R-CNN or Yolact even but still very good.

Sharif Elfouly
  • 508
  • 5
  • 10