Does performance differ between Python or C++ coding of OpenCV?

Question

I aim to start opencv little by little but first I need to decide which API of OpenCV is more useful. I predict that Python implementation is shorter but running time will be more dense and slow compared to the native C++ implementations. Is there any know can comment about performance and coding differences between these two perspectives?

Most of the real work is done behind the scenes by the OpenCV `C` code anyway, so provided your own code is not too elaborate, the difference should not be as big as you'd naively expect. — juanchopanza, Nov 17 '12 at 17:20

score 186 · Accepted Answer · edited May 23 '17 at 12:34

As mentioned in earlier answers, Python is slower compared to C++ or C. Python is built for its simplicity, portability and moreover, creativity where users need to worry only about their algorithm, not programming troubles.

But here in OpenCV, there is something different. Python-OpenCV is just a wrapper around the original C/C++ code. It is normally used for combining best features of both the languages, Performance of C/C++ & Simplicity of Python.

So when you call a function in OpenCV from Python, what actually run is underlying C/C++ source. So there won't be much difference in performance.( I remember I read somewhere that performance penalty is <1%, don't remember where. A rough estimate with some basic functions in OpenCV shows a worst-case penalty of <4%. ie penalty = [maximum time taken in Python - minimum time taken in C++]/minimum time taken in C++ ).

The problem arises when your code has a lot of native python codes.For eg, if you are making your own functions that are not available in OpenCV, things get worse. Such codes are ran natively in Python, which reduces the performance considerably.

But new OpenCV-Python interface has full support to Numpy. Numpy is a package for scientific computing in Python. It is also a wrapper around native C code. It is a highly optimized library which supports a wide variety of matrix operations, highly suitable for image processing. So if you can combine both OpenCV functions and Numpy functions correctly, you will get a very high speed code.

Thing to remember is, always try to avoid loops and iterations in Python. Instead, use array manipulation facilities available in Numpy (and OpenCV). Simply adding two numpy arrays using C = A+B is a lot times faster than using double loops.

For eg, you can check these articles :

score 24 · Answer 2 · answered Jul 08 '18 at 18:46

All google results for openCV state the same: that python will only be slightly slower. But not once have I seen any profiling on that. So I decided to do some and discovered:

Python is significantly slower than C++ with opencv, even for trivial programs.

The most simple example I could think of was to display the output of a webcam on-screen and display the number of frames per second. With python, I achieved 50FPS (on an Intel atom). With C++, I got 65FPS, an increase of 25%. In both cases, the CPU usage was using a single core, and to the best of my knowledge, was bound by the performance of the CPU. Additionally this test case about aligns with what I have seen in projects I've ported from one to the other in the past.

Where does this difference come from? In python, all of the openCV functions return new copies of the image matrices. Whenever you capture an image, or if you resize it - in C++ you can re-use existing memory. In python you cannot. I suspect this time spent allocating memory is the major difference, because as others have said: the underlying code of openCV is C++.

Before you throw python out the window: python is much faster to develop in, and if long as you aren't running into hardware-constraints, or if development speed it more important than performance, then use python. In many applications I've done with openCV, I've started in python and later converted only the computer vision components to C++ (eg using python's ctype module and compiling the CV code into a shared library).

Python Code:

import cv2
import time

FPS_SMOOTHING = 0.9

cap = cv2.VideoCapture(2)
fps = 0.0
prev = time.time()
while True:
    now = time.time()
    fps = (fps*FPS_SMOOTHING + (1/(now - prev))*(1.0 - FPS_SMOOTHING))
    prev = now

    print("fps: {:.1f}".format(fps))

    got, frame = cap.read()
    if got:
        cv2.imshow("asdf", frame)
    if (cv2.waitKey(2) == 27):
        break

C++ Code:

#include <opencv2/opencv.hpp>
#include <stdint.h>

using namespace std;
using namespace cv;

#define FPS_SMOOTHING 0.9

int main(int argc, char** argv){
    VideoCapture cap(2);
    Mat frame;

    float fps = 0.0;
    double prev = clock(); 
    while (true){
        double now = (clock()/(double)CLOCKS_PER_SEC);
        fps = (fps*FPS_SMOOTHING + (1/(now - prev))*(1.0 - FPS_SMOOTHING));
        prev = now;

        printf("fps: %.1f\n", fps);

        if (cap.isOpened()){
            cap.read(frame);
        }
        imshow("asdf", frame);
        if (waitKey(2) == 27){
            break;
        }
    }
}

Possible benchmark limitations:

Camera frame rate
Timer measuring precision
Time spent in print formatting

Your test case just happens to be the one that would show the most difference between Python and C++. So it might not be realistic. A better test would look at the video frame and maybe try to computer the aim aim on a raod for a self driving car. This would be nearly the same run time for C++ or Python. The no-press case show how long it takes to load frame buffers, not ding any real work. SO frame loading dominates the time. If doing real work then frame buffing is only 2% of the total, not 100% of the total. — user3150208, Apr 11 '19 at 22:43
While I don't currently have any benchmarks, I suspect that it is more significant than you predict. For example, if you run in python `dst = cv2.filter2D(img, -1, kernel)` then the computer creates a copy of `img` and returns it as `dst`. If you don't use `img` then the GC comes and clean up the old image. There is no way around this in with the openCV python API. In C/C++, you can easily create a static image buffer of the correct size that does not get created/destroyed every frame. The time for memory allocations and freeing is not zero. — sdfgeoff, Apr 26 '19 at 08:13
Wouldn't an increase in frame rate from 50 to 65 be a 30% improvement (rather than 25%)? — AmigoNico, Oct 10 '20 at 22:08
this answer has serious issues. -- **no**, the python wrapper doesn't make copies the C++ APIs wouldn't do as well. -- memory allocations take negligible amounts of time, if they happen. -- timing differences in these example programs should be negligible and are for most people. whatever's causing these significant differences in the answer's measurements has not been explored. could be different computer, different opencv versions, different operating systems, lots of possible differences — Christoph Rackwitz, Jan 28 '23 at 20:14
@ChristophRackwitz It probably does have issues: it only provides one data point, and as you say, it only speculates as to the cause. However, the performance difference was real and measured. It was run on the exact same hardware, the runs were run multiple times within seconds of each other. The configuration of the machine didn't change etc. I'd love to see someone do some more thorough profiling work on _why_, but AFAIK I'm the only one who has actually bothered to try and measure the performance difference. — sdfgeoff, Jan 30 '23 at 18:26

score 10 · Answer 3 · edited Nov 29 '18 at 14:21

10

The answer from sdfgeoff is missing the fact that you can reuse arrays in Python. Preallocate them and pass them in, and they will get used. So:

    image = numpy.zeros(shape=(height, width, 3), dtype=numpy.uint8)
    #....
    retval, _ = cv.VideoCapture.read(image)

edited Nov 29 '18 at 14:21

barbsan

3,418
11
21
28

answered Nov 29 '18 at 13:59

Paul Rensing

101
1
4

As far as I can tell, many functions (such as filter2D) do not take destination arrays as parameters. However, if you can point me to some docs that say otherwise, I will gladly change my answer. I'd also be very interested to see a performance comparison with this technique. – sdfgeoff Apr 26 '19 at 08:18
5

Not sure why you say that. Here is doc for filter2d: https://docs.opencv.org/3.4/d4/d86/group__imgproc__filter.html#ga27c049795ce870216ddfb366086b5a04 Notice the 4th parameter in Python is "dst", which is the destination array. I have not checked everywhere, but the standard is that if there is a destination arg in C++, then it is there in Python – Paul Rensing Apr 27 '19 at 19:05
Huh, you're right. I'd not noticed that before. I guess I'll have to redo my performance comparison – sdfgeoff Apr 28 '19 at 21:44
can you elaborate on how the preallocation speeds up the process? I didn't quite get that part, because to me it looks like i'm just allocating the space at a different point, but allocating it all the same (?) – fogx Nov 12 '19 at 18:18
The savings for pre-allocating would come from allocating the array once, but then calling VideoCapture.read() or filter2d() inside a loop. A common usage might be to initialize, and then loop forever, reading an image from the camera and processing it. Pre-allocating would save a millisecond or so each iteration. – Paul Rensing Nov 14 '19 at 13:23
if you allocate with `np.empty`, *numpy* doesn't even have to zero out that memory, and it'll be **microseconds**. worrying about allocations is silly without *proper* measurements. – Christoph Rackwitz Jan 28 '23 at 20:26
2

Yes it has been a while since I worked on this, and I can't remember whether I measured anything directly related to pre-allocation. However, this kind of code is often run once per camera frame, so a minimum of 30FPS and newer cameras at 120FPS or faster, and the image could be 6MB of data (for a 1K color image). Even if the allocation is really quick (but I don't believe microseconds), you are talking about allocating and freeing 700MB/sec. That could really impact the memory allocation and garbage collector. – Paul Rensing Jan 29 '23 at 21:15

score 6 · Answer 4 · edited Jun 09 '15 at 18:09

You're right, Python is almost always significantly slower than C++ as it requires an interpreter, which C++ does not. However, that does require C++ to be strongly-typed, which leaves a much smaller margin for error. Some people prefer to be made to code strictly, whereas others enjoy Python's inherent leniency.

If you want a full discourse on Python coding styles vs. C++ coding styles, this is not the best place, try finding an article.

EDIT: Because Python is an interpreted language, while C++ is compiled down to machine code, generally speaking, you can obtain performance advantages using C++. However, with regard to using OpenCV, the core OpenCV libraries are already compiled down to machine code, so the Python wrapper around the OpenCV library is executing compiled code. In other words, when it comes to executing computationally expensive OpenCV algorithms from Python, you're not going to see much of a performance hit since they've already been compiled for the specific architecture you're working with.

Yes, python is interpreted. But almost all the work is done inside openCV. Let's say there is a 20/80 split. with 80% of the work done inside openCV OpenCV is written in compiled C. What we are talking about is how fast the remaining 20% of the code runs. Even if Python is 4X slower it only adds 30% to the execution time. ManyopenCV apps are 5/95 split so Python makes almost no difference — user3150208, Apr 27 '19 at 16:34

score 3 · Answer 5 · answered Apr 05 '21 at 15:29

3

Why choose? If you know both Python and C++, use Python for research using Jupyter Notebooks and then use C++ for implementation. The Python stack of Jupyter, OpenCV (cv2) and Numpy provide for fast prototyping. Porting the code to C++ is usually quite straight-forward.

answered Apr 05 '21 at 15:29

YScharf

1,638
15
20

1

Yes! I will just add that it also depends on your end application target. You can stay with Python as long as you are meeting your requirements for the end product. Another scenario could be that Python is not available (e.g. embedded board, ...) – Danilo Ramos Jan 24 '22 at 14:09

Does performance differ between Python or C++ coding of OpenCV?

5 Answers5

Linked

Related