If I run a YoloV4 model with leaky relu activations on my CPU with 256x256 RGB images in OpenCV with an OpenVINO backend, inference time plus non-max suppression is about 80ms. If, on the other hand, I convert my model to an IR following https://github.com/TNTWEN/OpenVINO-YOLOV4, which is linked to from https://github.com/AlexeyAB/darknet, inference time directly using the OpenVINO inference engine is roughly 130ms, which does not even include non-max suppression, which is quite slow when implemented naively in python.
Unfortunately, OpenCV does not offer all of the control I would like for the models and inference schemes I want to try (e.g. I want to change batch size, import models from YOLO repositories other than darknet, etc.)
What is the magic that allows OpenCV with OpenVINO backend to be so much faster?