45

I've been wondering if it was possible to use OpenCL for Android, find out that it wasn't possible, and dropped the subject altogether. But thanks to the blog post from january 14th on the official Android Developer blog (http://android-developers.blogspot.fr/2013/01/evolution-of-renderscript-performance.html), I discovered that parallel programming was possible since Android 4.0, thanks to RenderScript ! An API that has quite a few common features with OpenCL.

What I'm wondering now is : why did Google choose to implement this new solution, instead of pushing OpenCL forward (an open specification now handled by the Khronos group).

I mean, I know, it's not really hard to convert from one to the other, but still...

Anyway, if anyone as the real explanation, please let me know !

Redwarp
  • 3,221
  • 3
  • 31
  • 44
  • 1
    The discussion continued on LinkedIn: http://www.linkedin.com/groups/Why-Google-choose-RenderScript-instead-1729897.S.236075762 – arrayfire May 02 '13 at 18:11

2 Answers2

48

Apple holds the trademark on OpenCL. Google competes with Apple. Perhaps it's really that simple.

We've done work on OpenCL with Android (see here) and are happy to see it moving forward thanks to the work of Intel, Imagination, and other chip makers. Google will turn around soon enough.

arrayfire
  • 1,744
  • 12
  • 19
  • 1
    Why isn't this answer higher up? – Utkarsh Sinha Oct 18 '16 at 20:25
  • 1
    @UtkarshSinha: It's SO policy that the accepted answer always comes first. Even if another answer has 100x as many votes. There's only two answers. This one is as high as it can get. – hippietrail Sep 15 '19 at 10:28
  • Link for the mentioned blog changed to https://arrayfire.com/blog/opencl-on-mobile-devices/ (instead of http://arrayfire.com/opencl-on-mobile-devices/ ) – fhunter Oct 06 '22 at 14:42
31

The answer is that Android's needs are very different than what OpenCL tries to provide.

OpenCL uses the execution model first introduced in CUDA. In this model, a kernel is made up of one or many groups of workers, and each group has fast shared memory and synchronization primitives within that group. What this does is cause the description of an algorithm to be intermingled with how that algorithm should be scheduled on a particular architecture (because you're deciding the size of a group and when to synchronize within that group).

That's great when you're writing for one architecture and you want absolute peak performance, but it gets peak performance at the expense of performance portability. Maybe on your architecture, you have enough registers and shared memory to run 256 workers per group for best performance, but on another architecture, you'd end up with massive register spills with anything above 128 workers per group, causing an 80% performance regression. Meanwhile, because your code is written explicitly for 256 workers per group, the runtime can't do anything to try to improve the situation on another architecture--it has to obey what you've written. This sort of situation is common when moving from architecture to architecture on the desktop/HPC side of GPU compute.

On mobile, Android needs performance portability between many different GPU and CPU vendors with very different architectures. If Android were to rely on a CUDA-style execution model, it would be almost impossible to write a single kernel and have it run acceptably on a range of devices.

RenderScript abstracts control over scheduling away from the developer at the cost of some peak performance; however, we're constantly closing the gap in terms of what's possible with RenderScript. For example, ScriptGroup, an API introduced in Android 4.2, is a big part of our plans to further improve GPU code generation. There are plenty of new improvements coming that make writing fast code even easier, too.

Tim Murray
  • 2,205
  • 13
  • 13
  • 30
    OpenCL has a "let the runtime decide the group size" feature--just pass NULL for the group size in clEnqueueNDRangeKernel. – Andreas Klöckner Apr 27 '13 at 14:53
  • 7
    Great OpenCL libraries already account for these optimal parameter adaptions. It's super simple to have your code automatically change parameters based on the runtime HW type. – arrayfire May 02 '13 at 18:15
  • I should also point out that this has been discussed here more recently: https://code.google.com/p/android/issues/detail?id=36361 , with interesting comments from both sides. – Brad Larson Aug 12 '13 at 19:11
  • 2
    I would caution against going too far into RS Compute (though certainly explore your options). Google did a similar type of project to for graphics called RS graphics. Devs pushed back and wanted OpenGL and Google Deprecated RS Graphics. Discussion is here: https://groups.google.com/forum/#!topic/android-developers/m194NFf_ZqA As you can see by the post Brad mentioned above a majority of developers are not happy with this decision and prefer an open standard that give them much much more control like OpenCL. Even John Carmack recently said this was a "WTF?" decision. Its likely to happen again – Jim V Aug 12 '13 at 21:15
  • 37
    Please note, Tim Murray works on the Renderscript team at Google and has a vested interest in the success of Renderscript. And he stating incorrect information here "because you're deciding the size of a group and when to synchronize within that group". As mentioned above you can let the runtime decide the group size. – Jim V Aug 12 '13 at 21:19
  • 13
    For those who are flagging this answer, please stop. While you may disagree with the argument posed here, this was a question asking why Google chose to go with Renderscript, and a Google employee has provided their position on it. This is a viable answer to the question asked. If you wish to counter the assertions in this answer, use comments instead to explain why they might be incorrect. – Brad Larson Aug 12 '13 at 22:04
  • 19
    @BradLarson, as the OpenCL-community is quite frustrated with Google's not-invented-here handling of this open standard, what do you suggest what we should do? Google tries to avoid any discussion with us, but keeps telling above false information on OpenCL. – Vincent.StreamComputing Aug 12 '13 at 22:15
  • 2
    @BradLarson, on your link developers stated their opinion then Google refused it without discussion. Then when developers pushed back they again said tough don't use Android. Then Devs pushed back again and they said no and you don't matter. Then Devs pushed back yet again and they simply closed the comments. That isn't a discussion it is just talking to a wall. – Jim V Aug 13 '13 at 02:32
  • 2
    @JimV - They were all being flagged for employing personal attacks, and were removed by another moderator. I've restored one, without the unnecessarily harsh language. In regards to your comment, Stack Overflow is not the place to push on developers to change policy. If you wish to counter technical arguments they've made here, fine, but I don't want this devolving into a flame war. – Brad Larson Aug 13 '13 at 02:36
  • 3
    As noted in previous comments, this answer is outright factually wrong. – user703016 Mar 01 '15 at 08:26
  • 2
    I'm against vendor lock-ins even though they sell. Google can still implement Renderscript with OpenCL and expose both APIs, it would be a pragmatic solution. What is the next step? Will Google try to block other Java bindings to the OpenGL ES APIs? Renderscript isn't a cross-platform processing API, it fits to different needs, I won't use a platform-specific API for something I do in desktop and embedded environments. Ideally, Google should contribute to OpenCL to help making it better. The end users should choose Android because it fits to their needs, not because the lack of OpenCL support. – gouessej Aug 11 '15 at 11:27