I have a few basic algorithms (DCT/IDCT and a few others) ported and working (as expected atleast functionally) on Nexus 10. Since these algorithms are first implementations, their execution time is currently running into secs, which is understandable.
However, given the architecture of Renderscript, I see that these algorithms run either on CPU or GPU depending on other parallel application activities. For instance, in my application, there is a scrollview for images and any activity on this view, essentially pushes the renderscript execution to CPU. If there is no activity the algorithm runs on GPU. I see this live via ARM-DS5 Mali/A15 traces.
This situation is presenting itself as debug/tuning nightmare, since the performance delta when the algorithm runs on CPU (dual core) versus GPU (Mali) is of the order of 2 secs, making it very difficult to gauge the performance improvements that I am doing on my algorithm code.
is there a way to get around this problem? One possible solution is to atleast have a debug configurability option to choose the target type (ARM, GPU) for renderscript code?