I'm running a Mask R-CNN model on an edge device (with an NVIDIA GTX 1080). I am currently using the Detectron2 Mask R-CNN implementation and I archieve an inference speed of around 5 FPS.
To speed this up I looked at other inference engines and model implementations. For example ONNX, but I'm not able to gain a faster inference speed.
TensorRT looks very promising to me but I did not found a ready "out-of-the-box" implementation for it.
Are there any other mature and fast inference engines or other techniques to speed up the inference?