I wrote a image processing app for android (https://play.google.com/store/apps/details?id=cv.cvExperiments) with some C++ code wrapped with JNI. To get some speedup on multicore processors, I annotated expensive loops with openmp "parallel for" directives.
The thing is that on x86, I get some speedup ranging from x3 to x5 on a 4cores proc, but on Android, activating OpenMP (with -fopenmp) does no give any speedup on ARM 32bits and even slow down the code on a 64bits armv8 snapdragon 810.
Did I miss something ? Does anybody could ever observe speedups on android+arm comparable to x86 cpus?
There is lots of tutorial on internet on how to activate OpenMP but no benchmark showing speedups. any pointers?
The only relevant piece of information I found is a benchmark of the OpenMP overhead on armv8, and they also noticed some pretty high overhead : https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/GPGPU/Docs/OpenMPforARMv8PortAnalysis
Thanks, Matthieu