I'm developing a cross-platform application in C++ at the moment, mostly targeted for Android and iOS. Overall it works pretty well and has incredible performance, but on iPhone 4 (S) it runs very very slow (see figures below).
The aim is to process ~5-10 fps of a video stream with a certain algorithm.
Beside others, the code was tested successfully (5 or more processed frames per second) and profiled on following devices:
- Google Nexus 4
- Google Nexus 5
- Galaxy S mini
- Galaxy S3
- Sony XPeria Z
- Google Nexus one (yes, also working there)
- Huawei P1 and P2
Galaxy Note
iPad2 mini
- iPhone 5
- iPhone 5s
However, as mentioned, it does not work on iPhone 4 and iPhone 4s. Both of them process 1 frame every two seconds => 0.5fps
Of course, this seems a bit strange since it is working on "weaker" devices like the Huawei and even Nexus One (2fps), so I started profiling with Instruments for performance and memory consumption.
Memory consumption is ok, at most 16MB are used (as you can see from the image). However, the profiling of runtime left me a bit shocked.
And inverse call tree:
Now, as you can see the CPU is busy with the cvtColor()-function (cv::RGB2RGB) for a huge share of total runtime. Internally the parallel_for implementation is used - could it be probably linked to the CPU not being suitable for running that code. Or is it just the cv::RGB2RGB function which is implemented somehow strange in OpenCV, because the BGR2Gray-conversion seems to run a lot faster?
I use the latest precompiled Version of OpenCV v2.4.9 for iOS. The piece of code in questions does basically nothing but color conversion from BGRA to Grayscale. It looks like:
Mat colorMat;
Mat gray;
colorMat = Mat(vHeight,vWidth,CV_8UC4, rImageData); // no data is copied
cvtColor(colorMat,colorMat,CV_BGRA2BGR);
cvtColor(colorMat,gray,CV_BGR2GRAY);
Note its split up in two conversions, since further processing needs RGB and Gray information - that's why not in one conversion step.
Another side remark: I also tested the OpenCV for iOS samples (Chapter 12: Processing video), which delivered (when started with 30fps capturing rate):
- iPhone 4: 5.6 fps
- iPad mini: 30.4 fps
My questions Since it is working very well on a wide range of devices and also on iOS devices, I conclude it has to be related to either Hard- or Software of the iPhone 4(s).
Has anybody a clue on what's possibly going wrong here? Has anybody experienced similar issues? I found very scarce information on the internet on people experiencing the same performance issues (i.e. here and here).
I'm aware of the fact that there a different video sizes, but two "simple" color conversions of an image with 1280x720 pixels is not supposed to consume around 2 seconds, especially not on a quite recent device as the iPhone 4 (S) are!
Any help, hints or experiences in this manner are highly appreciated!
Progress and further findings
Based on remi's comment I experimented with alternate solutions. Unfortunately I have to say that also the following (very trivial) thing does not work:
Mat colorMat, gray;
vector<Mat> channels;
AVDEBUG("starting", TAG,1);
colorMat = Mat(vHeight,vWidth,CV_8UC4, rImageData); // no data is copied
AVDEBUG("first", TAG, 1);
split(colorMat, channels);
AVDEBUG("intermediate " << colorMat.size(), TAG, 1);
// no BGRA2BGR conversion at all!!
gray = channels[0]; // take blue channel for gray
AVDEBUG("end", TAG, 1);
Produces the following output:
2014-07-24 09:07:41.763 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) Frame accepted (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 591)
2014-07-24 09:07:41.765 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) starting (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 636)
2014-07-24 09:07:41.771 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) first (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 641)
2014-07-24 09:07:44.599 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) intermediate [720 x 1280] (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 665)
2014-07-24 09:07:44.605 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) ending (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 682)
Hence the Mat constructor Mat() is fast, because no data is copied (refer docs). However, the split() function takes in this code sample almost 3seconds!! Taking the blue channel as gray Mat is then fast again, since only a Mat-header is created.
This once again indicates that there is something wrong with the loop implementation, since split() copies data, which is obviously done in a loop.