2

I've been looking around for solution at least 2 days unsuccesfully so as my last hope I decided to ask it here.

Here at work we have a C++ code base using OpenCV that we want to run both on iOS and Android. It turned out that the whole thing runs slower on Android and I can't find out the reason. After profiling we got to know that the method that invokes the C++ code is the problem which is the exact same on both platforms. On Android(Samsung Galaxy S4) it takes 140-150 ms to execute while on iOS(iPhone 5) it's under 70ms. I've read some articles about optimizing native code and using different local_cflags but didn't seem to help.

Is this a fact that needs to be admitted or is there a solution? Thank you in advance, Mike

arrafutott
  • 55
  • 5
  • That's two different CPU's. The performance difference isn't huge, and the premium brand is the faster one. Doesn't look surprising to me. Note that there will be even slower Android phones; some Chinese knock-offs can be 10x slower. – MSalters Sep 03 '14 at 10:18
  • It is true but Galaxy S4 has a better CPU in specifications. Looks like it doesn't matter. – arrafutott Sep 03 '14 at 10:30
  • 1
    Make sure you use an armeabi-v7a build, not only a plain armeabi build. The plain armeabi builds don't use the FPU and thus are quite limited performance wise if the code uses lots of floats. (For integer operations the difference shouldn't be quite as big.) – mstorsjo Sep 03 '14 at 10:59
  • Sadly I'm using armeabi-v7a – arrafutott Sep 03 '14 at 11:06
  • 1
    There's no way you are going to get a useful answer with that problem statement. You might as well read [Pro Android Apps Performance Optimizations](http://people.cs.deu.edu.tr/semih/Android_Books/Apress%20Pro%20Android%20Apps%20Performance%20Optimization%20%282012%29.pdf). – jww Sep 03 '14 at 14:49

1 Answers1

4

Your experience correlates well with mine. In my experience using OpenCV on iOS and Android (in a Nexus 4 in my case):

  1. Android is generally slower if you only use single-threaded code. Apple CPU cores are faster than any core I've tested on Android phones (see the many phone reviews available online), while latest Android phones have 4 or more cores. On iOS, OpenCV uses GCD to run a few algorithms in parallel, but on Android it doesn't use OpenMP (which is the alternative, but only works with GCC 4.x, and not Clang). Sadly, using OpenMP outside the main thread is a pain. This bug is still present in r10 of the NDK, so either you recompile the toolchain with the patches, or you are stuck on the main thread, which is not the best option for heavy computation.

  2. OpenCV on Android, by default, comes compiled with Thumb instructions, which are slower. I suggest recompiling it setting ARM mode ON, and NEON.

  3. Autovectorization flags. If you are using GCC on the NDK, you have to use -O3, plus -funsafe-math-optimizations to enable autovectorization with NEON.

  4. Throttling of the CPU frequency. My Nexus 4 seems to throttle the CPU frequency more enthusiastically than iOS. We've seen substantial swings in timings on Android code that runs at very stable timings on iOS, and the only reason we can think of is the CPU frequency. Renderscript (see this answer) maxes out the CPU frequency, but the battery life will suffer (and you have to rewrite the code).

Community
  • 1
  • 1
user1906
  • 2,310
  • 2
  • 20
  • 37
  • 1
    Certain OEMs (e.g. qualcomm) are extremely aggressive about CPU throttling. You can tell if that's a problem by continuously dragging your finger across the screen as your test runs -- part of the algorithm is to keep the clocks high when touch updates are detected so that the device feels responsive when interacting. It may also be worth looking into the hard float changes that went in r9b (http://stackoverflow.com/questions/3004915/getting-hardware-floating-point-with-android-ndk), though I doubt that'd get you more than 10% or so. – fadden Sep 03 '14 at 15:24
  • I read about neon on several sites and they said to use different cflags but it didn't seem to work. Am I missing something? Do I have to change my code to get the boost of neon? I forgot to write that the C++ code uses 4 threads so 1) may not help. 2) and 3) brought back hope though I've already added -O3 flag. So the question is how does neon work? – arrafutott Sep 04 '14 at 08:58
  • Without knowing anything about the code in particular, it's difficult to tell why your code is slower. Have you done any profiling? Which parts are the slowest? NEON is the equivalent of SSE/AVX on x86 machines. Which means that if the compiler doesn't vectorize your code for you, you will have to do it yourself, or use a library that already has a NEON implementation. – user1906 Sep 05 '14 at 00:27
  • Yes, I've done. I found out I can compile openCV using NEON with flags in CMakeLists.txt. Another question is do I have to apply NEON code to my own C++ base or is it called automatically? – arrafutott Sep 09 '14 at 11:47