I implemented an algorithm on android using OpenCL and OpenMP. The OpenMP implementation runs about 10 times slower than the OpenCL one.
- OpenMP: ~250 ms
- OpenCL: ~25 ms
But overall, if I measure the time from the java android side, I get roughly the same time to call and get my values.
For example:
Java code:
// calls C implementation using JNI (Java Native Interface) bool useOpenCL = true; myFunction(bitmap, useOpenCL); // ~300 ms, timed with System.nanoTime() here, but omitted code for clarity myFunction(bitmap, !useOpenCL); // ~300 ms, timed with System.nanoTime() here, but omitted code for clarity
C code:
JNIEXPORT void JNICALL Java_com_xxxxx_myFunctionNative(JNIEnv * env, jobject obj, jobject pBitmap, jboolean useOpenCL) { // same before, setting some variables clock_t startTimer, stopTimer; startTimer = clock(); if ((bool) useOpenCL) { calculateUsingOpenCL(); // runs in ~25 ms, timed here, using clock() } else { calculateUsingOpenMP(); // runs in ~250 ms } stopTimer = clock(); __android_log_print(ANDROID_LOG_VERBOSE, APPNAME, "Time in ms: %f\n", 1000.0f* (float)(stopTimer - startTimer) / (float)CLOCKS_PER_SEC); // same from here on, e.g.: copying values to java side }
The Java code, in both cases executes roughly in the same time, around 300 ms. To be more precise, elapsedTime
is a bit more for OpenCL, that is OpenCL is slower on average.
Looking at the individual run-times of the OpenMP, and OpenCL implementations, OpenCL version should be much faster overall. But for some reason, there is an overhead that I cannot find.
I also compared OpenCL vs Normal native code (no OpenMP), I still got the same results, with roughly same runtime overall, even though the calculateUsingOpenCL
ran at least 10 times faster.
Ideas:
Maybe the GPU (in OpenCL case) is less efficient in general, because it has less memory available. There are few variables that we need to preallocate, which are used every frame. So, we checked the time it takes for android to draw a bitmap in both cases (OpenMP, OpenCL). In the OpenCL case, sometimes drawing a bitmap took longer (3 times longer), but not by the amount that would equalize the overall run time of the program.
Does JNI use GPU to accelerate some calls, which could cause the OpenCL version to be slower?
EDIT:
- Is it possible that Java Garbage collection is triggered by OpenCL, causing the big overhead?