Why is opencv dnn slower if I use Halide?

Question

I am testing the performance of some samples in the opencv source tree depending on if halide is used or not.

Surprisingly, the performance is worse if halide is used for the computation:

squeezenet_halide: ~24ms with halide and ~16ms without halide.
resnet_ssd_face: ~84ms with halide and ~36ms without halide.

I have compiled halide and opencv following the instructions in this tutorial. The opencv code was downloaded from the master branch of the opencv git repository.

I have tested the performance using the sample files 'resnet_ssd_face.cpp' and 'squeezenet_halide.cpp'. In both cases I include one of these code lines just before call 'forward', to activate or deactivate halide:

net.setPreferableBackend(DNN_BACKEND_HALIDE);  // use Halide

net.setPreferableBackend(DNN_BACKEND_DEFAULT);   // NOT use Halide

The time is measured using this code just after the call to 'forward' function:

std::vector<double> layersTimings;
double freq = cv::getTickFrequency() / 1000;
double time = net.getPerfProfile(layersTimings) / freq;
std::cout << "Time: " << time << " ms" << std::endl;

Is there anything missed in the tutorial? Should Halide be compiled with different parameters?

My setup is:

OS: Linux (Ubuntu 16.04)
CPU: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
GPU: nVidia GeForce GT 730 (Driver Version: 384.90)
Cuda: CUDA Version 9.0.176

Don't have time to delve in deeply right now, but the timing is going to be very backend dependent and it is possible highly optimized libraries will be faster for some things, especially as I expect the Halide scheduling is still a work in progress. Halide execution will also involve a just in time compilation step on the first invocation, which may be quite costly even if amortized over a lot of calls. Accurate timing will require doing a warmup step to make sure everything is compiled. (There are likely two JIT steps if targeting e.g. OpenCL, but both will happen on first execution.) — Zalman Stern, Nov 09 '17 at 22:55
I can understand the initialization task in the first step. However, the 'resnet_ssd_face' computes several video frames, so the initialization is supposed to be done only in the initial frame, and the 'squeezenet_halide' sample is measured using a for and computing the average spent time ignoring the first iteration. Moreover, the title of the tutorial claims to an efficiency improvement with the use of halide. I think it is difficult for the opencv team to be wrong in this kind of things. — goe, Nov 10 '17 at 07:56
Actually, the answer is in default backend improvements. By the time of the first Halide appearance in OpenCV it was really faster. However it's slower for now. goe, you're right about the title but there is a reference to actual efficiency measurements. I think next tutorial's patch must replace the title to something more neutral. Thanks! BTW, it's a good chance to experiment with introduced autoscheduling approach in Halide. — Dmitry Kurtaev, Nov 13 '17 at 16:59
If the Halide backend when from "faster" to "slower", it sounds like some automated benchmarks to test for performance regression are definitely in order :-) — Steven Johnson, Nov 13 '17 at 17:31
Thanks all. Those are good comments. I have included an answer including all the info in the comments. I hope to be able to update the answer with greater news in the future. — goe, Nov 14 '17 at 09:45

goe · Accepted Answer · 2017-11-14T09:53:22.667

2

Taking into account the comment by Dmitry Kurtaev and looking the wiki in the OpenCV GitHub account, I found a page where a benchmark comparing different approaches is included (I missed the links in the tutorial).

Also, there is a merge request where a similar benchmark is included.

In both of them, the time measurement shows that the performance using Halide is worse than with the original c++ approach.

I can assume that the Halide integration is in an early stage. Moreover, as Zalman Stern comments, the Halide scheduling is a work in progress and the original optimizations in dnn module of opencv could be more accurate than the included scheduling for Halide.

I hope this measures could change in future versions of OpenCV, but for now, this is the performance.

edited Nov 14 '17 at 09:53

answered Nov 14 '17 at 09:37

goe

2,272
1
19
34

goe, I would like to recommend you [Intel's Deep Learning Inference Engine](https://software.intel.com/en-us/inference-engine-devguide) backend for OpenCV deep learning module. Please checkout a [wiki page](https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend) for details. – Dmitry Kurtaev Feb 08 '18 at 15:49
Thanks for the comment, I will test this ASAP. It looks great for Linux and Windows, but it looks difficult to run this on iOS and Android. – goe Feb 08 '18 at 16:13

score 1 · Answer 2 · answered Aug 10 '18 at 11:39

1

My answer is slightly unrelated but helpful

For face detection + Face alignment :

Normal SSD detection time : 50 - 55ms

Using Openvino inference engine : 40 - 45 ms

answered Aug 10 '18 at 11:39

Naman

56
4

Do you have an example code that shows how to use the Inference Engine face detection model in OpenCV, namely: load the model using OpenCV dnn::Net::readXXXXXX(xml, bin), which I have loaded and no problem, but, the next step which is how to pass the frame (cv::Mat) to the network and get the result, Thanks!! – Bahramdun Adil Dec 05 '18 at 07:16

Why is opencv dnn slower if I use Halide?

2 Answers2