92

What features make OpenCL unique to choose over OpenGL with GLSL for calculations? Despite the graphic related terminology and inpractical datatypes, is there any real caveat to OpenGL?

For example, parallel function evaluation can be done by rendering a to a texture using other textures. Reducing operations can be done by iteratively render to smaller and smaller textures. On the other hand, random write access is not possible in any efficient manner (the only way to do is rendering triangles by texture driven vertex data). Is this possible with OpenCL? What else is possible not possible with OpenGL?

Michael Durrant
  • 93,410
  • 97
  • 333
  • 497
dronus
  • 10,774
  • 8
  • 54
  • 80
  • 1
    Another interesting question would be if OpenGL can offer something that OpenCL can't. For example, OpenGL will automatically interpolate vertex data that has been declared with the `varying`-keyword, for you. How would you achieve the corresponding thing in OpenCL? – HelloGoodbye Jan 12 '14 at 01:53
  • I think that would easily be possible by using interpolation by some index given to the compute kernel for every invocation. – dronus Jan 12 '14 at 11:41
  • 1
    We have 2015, still no reliable access of OpenCL on all platforms, still curious what quality of computation can be achieved by OpenCL but not OpenGL2.0. – dronus Apr 11 '15 at 07:43
  • 1) OpenCL device can be a cpu, without any gpus and still working where graphics render fails at all. – xakepp35 Jul 18 '17 at 06:34
  • 2) Consider which stack is thinner, e.g on barebone linux kernel? OpenCL which requires only simple thing like driver, amdgpu-pro, shipped with all nesesary libs (i did OpenCL miner firmware with only 50mb footprint). Or renderer (150+mb) which requires more messing, several heavy frameworks, xorgs and so on, and things are done like inside mesa3d/gallium and so on. what is it all for? if your task only is to compute and you have no running x server, and, even, no monitor attached. so, basically, GL is more "junk-overloaded" than CL, in order to support all-and-everything developed for years. – xakepp35 Jul 18 '17 at 06:41

11 Answers11

71

OpenCL is created specifically for computing. When you do scientific computing using OpenGL you always have to think about how to map your computing problem to the graphics context (i.e. talk in terms of textures and geometric primitives like triangles etc.) in order to get your computation going.

In OpenCL you just formulate you computation with a calculation kernel on a memory buffer and you are good to go. This is actually a BIG win (saying that from a perspective of having thought through and implemented both variants).

The memory access patterns are though the same (your calculation still is happening on a GPU - but GPUs are getting more and more flexible these days).

But what else would you expect than using more than a dozen parallel "CPUs" without breaking your head about how to translate - e.g. (silly example) Fourier to Triangles and Quads...?

cli_hlt
  • 7,072
  • 2
  • 26
  • 22
  • 1
    Fourier to Triangles and Quads... well with a simple scaffold of rendering one large quad onto a texture we just have a simple parallel mapping of one or more large memory blocks to another. With textures of different scale its also easy to map a different amount (ususally 2^n) of values onto another. Thats not too much GL code and fits a large area of problems. So I like to know what OpenCL could do more... – dronus Dec 07 '11 at 20:35
  • 3
    By using OpenCL you simply omit the mapping altogether, avoid writing the shaders that ought to deal with geometry and fragments, avoid thinking about the various transformation of coordinates (world, screen/buffer, texture) and directly express your algorithm like you learnt in your numerics class. I haven't had a problem with the first, but like the latter more. And well, I didn't come up with the idea to OpenCL in the first place - but as somebody else did, why shouln't it be put to its intended use? GPGPU was cool for its time being, now just use OpenCL. – cli_hlt Dec 07 '11 at 21:26
  • 7
    @cli_hlt, OpenCL is also GPGPU. – Simon Jul 13 '13 at 16:31
  • @Simon In a broad sense, yes you are right. But, according to Wikipedia "General-purpose computing on graphics processing units (GPGPU, rarely GPGP or GP²U) is the utilization of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU)" (they have additional references that I omit now). With OpenCL the whole point of "which typically handles computation only for computer graphics" is not given anymore. So it is not GPGPU in the original meaning. – cli_hlt Oct 24 '14 at 20:26
  • @cli_hlt: Perhaps, but the *devices* are still meant primarily for computer graphics. They are still called GPUs, after all! – Tim Čas Jan 19 '16 at 02:03
  • "using OpenGL you always have to think about how to map your computing problem to the graphics context" is this also true with GLSL? I thought the whole point of GLSL was, that you could compute arbitrary data. – wotanii Dec 13 '16 at 14:56
68

Something that hasn't been mentioned in any answers so far has been speed of execution. If your algorithm can be expressed in OpenGL graphics (e.g. no scattered writes, no local memory, no workgroups, etc.) it will very often run faster than an OpenCL counterpart. My specific experience of this has been doing image filter (gather) kernels across AMD, nVidia, IMG and Qualcomm GPUs. The OpenGL implementations invariably run faster even after hardcore OpenCL kernel optimization. (aside: I suspect this is due to years of hardware and drivers being specifically tuned to graphics orientated workloads.)

My advice would be that if your compute program feels like it maps nicely to the graphics domain then use OpenGL. If not, OpenCL is more general and simpler to express compute problems.

Another point to mention (or to ask) is whether you are writing as a hobbyist (i.e. for yourself) or commercially (i.e. for distribution to others). While OpenGL is supported pretty much everywhere, OpenCL is totally lacking support on mobile devices and, imho, is highly unlikely to appear on Android or iOS in the next few years. If wide cross platform compatibility from a single code base is a goal then OpenGL may be forced upon you.

user2746401
  • 3,157
  • 2
  • 21
  • 46
  • I think this answer really needs more upvotes to show up earlier in this thread. Performance considerations and mobile device compatibility should be critical aspects to consider first... at least the performance considerations, in case you have no interest in mobile (but today, how can't you or, rather, how can you afford not to? :p) – warship Jul 06 '16 at 05:43
  • How can OpenGL be faster than OpenCL? It does much more and the overhead of managing OpenGL state is high. Did you compare to OpenCL with native_* functions? What kind of operations did you compare? Can you publish the code? – Yoav Aug 15 '16 at 20:53
  • 2
    Hi Ben-Uri. Sadly I can't share code. You are right about GL state being rather heavy but well written GL code can mostly avoid state changes, especially for compute-like tasks (Vulkan is way better in this respect btw). Individual operations tend to be about the same between GL/CL but the GLSL compilers seem more mature and produce overall tighter code. Also, for structured writes, GL pixel shaders can make use of the render output units (ROPs) whereas CL must use the generic memory subsystem (slower) as it (usually) cannot be known at compile time if the writes will be structured. – user2746401 Aug 16 '16 at 12:05
28

What features make OpenCL unique to choose over OpenGL with GLSL for calculations? Despite the graphic related terminology and inpractical datatypes, is there any real caveat to OpenGL?

Yes: it's a graphics API. Therefore, everything you do in it has to be formulated along those terms. You have to package your data as some form of "rendering". You have to figure out how to deal with your data in terms of attributes, uniform buffers, and textures.

With OpenGL 4.3 and OpenGL ES 3.1 compute shaders, things become a bit more muddled. A compute shader is able to access memory via SSBOs/Image Load/Store in similar ways to OpenCL compute operations (though OpenCL offers actual pointers, while GLSL does not). Their interop with OpenGL is also much faster than OpenCL/GL interop.

Even so, compute shaders do not change one fact: OpenCL compute operations operate at a very different precision than OpenGL's compute shaders. GLSL's floating-point precision requirements are not very strict, and OpenGL ES's are even less strict. So if floating-point accuracy is important to your calculations, OpenGL will not be the most effective way of computing what you need to compute.

Also, OpenGL compute shaders require 4.x-capable hardware, while OpenCL can run on much more inferior hardware.

Furthermore, if you're doing compute by co-opting the rendering pipeline, OpenGL drivers will still assume that you're doing rendering. So it's going to make optimization decisions based on that assumption. It will optimize the assignment of shader resources assuming you're drawing a picture.

For example, if you're rendering to a floating-point framebuffer, the driver might just decide to give you an R11_G11_B10 framebuffer, because it detects that you aren't doing anything with the alpha and your algorithm could tolerate the lower precision. If you use image load/store instead of a framebuffer however, you're much less likely to get this effect.

OpenCL is not a graphics API; it's a computation API.

Also, OpenCL just gives you access to more stuff. It gives you access to memory levels that are implicit with regard to GL. Certain memory can be shared between threads, but separate shader instances in GL are unable to directly affect one-another (outside of Image Load/Store, but OpenCL runs on hardware that doesn't have access to that).

OpenGL hides what the hardware is doing behind an abstraction. OpenCL exposes you to almost exactly what's going on.

You can use OpenGL to do arbitrary computations. But you don't want to; not while there's a perfectly viable alternative. Compute in OpenGL lives to service the graphics pipeline.

The only reason to pick OpenGL for any kind of non-rendering compute operation is to support hardware that can't run OpenCL. At the present time, this includes a lot of mobile hardware.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 6
    'OpenGL hides what the hardware is doing behind an abstraction. OpenCL exposes you to almost exactly what's going on.' is still on an abstract level I think. The GPUs have fixed modules (like 'Render Output Units' and 'Texture Mapping Units') expressed in OpenGL features. – dronus Feb 17 '12 at 14:37
  • 1
    @ybungalobill According to the description of `glTexImage2D`, "The GL will choose an internal representation that closely approximates that requested by internalFormat, but it may not match exactly". – GuyRT Jun 30 '14 at 11:02
  • 1
    @GuyRT: It usually *does* give you 32F for 32F --- the typical change is a different order of channels, though (e.g. BGRA instead of RGBA). – Tim Čas Jan 19 '16 at 02:07
  • Does this answer refer to "OpenGL/GSLS" or just OpenGL? – wotanii Dec 13 '16 at 14:57
  • 1
    @wotanii: GLSL is the shading language used by OpenGL. So there is no "just OpenGL". – Nicol Bolas Dec 13 '16 at 15:19
  • I,m new to computer graphics and vision. which one I have to chose in case to create a human body mesh and capture its motion for machine learning and apply the result on the new ones? – DragonKnight Oct 27 '17 at 09:41
12

Although currently OpenGL would be the better choice for graphics, this is not permanent.

It could be practical for OpenGL to eventually merge as an extension of OpenCL. The two platforms are about 80% the same, but have different syntax quirks, different nomenclature for roughly the same components of the hardware. That means two languages to learn, two APIs to figure out. Graphics driver developers would prefer a merge because they no longer would have to develop for two separate platforms. That leaves more time and resources for driver debugging. ;)

Another thing to consider is that the origins of OpenGL and OpenCL are different: OpenGL began and gained momentum during the early fixed-pipeline-over-a-network days and was slowly appended and deprecated as the technology evolved. OpenCL, in some ways, is an evolution of OpenGL in the sense that OpenGL started being used for numerical processing as the (unplanned) flexibility of GPUs allowed so. "Graphics vs. Computing" is really more of a semantic argument. In both cases you're always trying to map your math operations to hardware with the highest performance possible. There are parts of GPU hardware which vanilla CL won't use but that won't keep a separate extension from doing so.

So how could OpenGL work under CL? Speculatively, triangle rasterizers could be enqueued as a special CL task. Special GLSL functions could be implemented in vanilla OpenCL, then overridden to hardware accelerated instructions by the driver during kernel compilation. Writing a shader in OpenCL, pending the library extensions were supplied, doesn't sound like a painful experience at all.

To call one to have more features than the other doesn't make much sense as they're both gaining 80% the same features, just under different nomenclature. To claim that OpenCL is not good for graphics because it is designed for computing doesn't make sense because graphics processing is computing.

user515655
  • 989
  • 2
  • 10
  • 24
12

One notable feature would be scattered writes, another would be the absence of "Windows 7 smartness". Windows 7 will, as you probably know, kill the display driver if OpenGL does not flush for 2 seconds or so (don't nail me down on the exact time, but I think it's 2 secs). This may be annoying if you have a lengthy operation.

Also, OpenCL obviously works with a much greater variety of hardware than just the graphics card, and it does not have a rigid graphics-oriented pipeline with "artificial constraints". It is easier (trivial) to run several concurrent command streams too.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • +1 for mentioning scattering, though recent extensions (like `shader_image_load_store`) work on that, or you could use the geometry shader to generate additional points or select different output targets. But nothing compared to the flexibility of OpenCL. – Christian Rau Oct 26 '11 at 19:15
  • Thing is that you don't know at all what happens because everything is essentially driver dependent. Of course you can do e.g. random memory access if the implementation allows it, but what would be the benefit if it turns out that by doing this the driver just swaps your whole computation to the host instead of the hw your code is supposed to run on... – cli_hlt Oct 26 '11 at 19:45
  • 2
    @cli_hlt: You get to decide what device your task queues (an thus kernels) will run on, beforehand. The implementation has no option to decide something else later. Also, features like scattered writes or local memory are not something "special" that the hardware supports or does not support. It's just that under OpenGL the same hardware will not expose it, because OpenGL implements a graphics pipeline. As such, it _simply does not make sense_ to support writing to local memory in a pixel shader (and "historic" hardware could indeed not do that). Under OpenCL, it makes sense and is allowed. – Damon Oct 26 '11 at 19:56
  • 2
    ("it simply does not make sense" may be a somewhat too harsh wording, but you get what I mean. It is not what you usually want for graphics, and it is not what GPUs could do, say, a decade ago. OpenGL implements a "turn vertices and connectivity information into image" service. OpenCL implements a "crunch arbitrary data into some other data" service.) – Damon Oct 26 '11 at 20:00
  • @Damon But you can still run into situations where your API makes promises that the HW cannot fulfill. In these cases, you can still use the API, but internally the driver processes your request on the host HW instead of the Graphics HW. You only get to know if you run deep analysis of every bit that happens, and believe me or not, we got quite some insights by contacting vendors upon experiencing situations that we were unable to explain. – cli_hlt Oct 26 '11 at 20:04
  • 1
    You do know that the OS will kill the driver too if OpenCL does a lengthy calculation on the GPU? – Tara Mar 27 '13 at 13:37
  • @Tara Do you have a source that OpenCL gets killed after a specific time? How to recognize it? Many thanks in advance! – user7427029 Jan 17 '23 at 23:00
  • @user7427029 My comment is based on personal experience. If that's not reliable enough for you, you can look up "Nvidia watchdog timer" (for example: https://forums.developer.nvidia.com/t/watchdog-timer-kills-cuda-code/133603/3). The workload shouldn't matter, since this is an OS feature (OS lost contact with the GPU). I have to admit though, I am not sure if the watchdog always works reliably. I do remember situations where my PC would lock itself up regardless. The timeout is adjustable in the registry. Example: https://docs.nframes.com/troubleshooting/specific-issues/cuda-driver-timeout/ – Tara Feb 17 '23 at 04:00
6

Another major reason is that OpenGL\GLSL are supported only on graphics cards. Although multi-core usage started with using graphics hardware there are many hardware vendors working on multi-core hardware platform targeted for computation. For example see Intels Knights Corner.

Developing code for computation using OpenGL\GLSL will prevent you from using any hardware that is not a graphics card.

Tal Darom
  • 1,379
  • 1
  • 8
  • 26
  • I think OpenCL will also prevent my code from running efficiently on any hardware that is not a graphics card today.. Because the favorable parallel computation done in OpenCL is well matched for GPU but quite inefficient on todays vanilla CPUs. – dronus Dec 07 '11 at 20:38
4

Well as of OpenGL 4.5 these are the features OpenCL 2.0 has that OpenGL 4.5 Doesn't (as far as I could tell) (this does not cover the features that OpenGL has that OpenCL doesn't):

Events

Better Atomics

Blocks

Workgroup Functions: work_group_all and work_group_any work_group_broadcast: work_group_reduce work_group_inclusive/exclusive_scan

Enqueue Kernel from Kernel

Pointers (though if you are executing on the GPU this probably doesn't matter)

A few math functions that OpenGL doesn't have (though you could construct them yourself in OpenGL)

Shared Virtual Memory

(More) Compiler Options for Kernels

Easy to select a particular GPU (or otherwise)

Can run on the CPU when no GPU

More support for those niche hardware platforms (e.g. FGPAs)

On some (all?) platforms you do not need a window (and its context binding) to do calculations.

OpenCL allows just a bit more control over precision of calculations (including some through those compiler options).

A lot of the above are mostly for better CPU - GPU interaction: Events, Shared Virtual Memory, Pointers (although these could potentially benefit other stuff too).

OpenGL has gained the ability to sort things into different areas of Client and Server memory since a lot of the other posts here have been made. OpenGL has better memory barrier and atomics support now and allows you to allocate things to different registers within the GPU (to about the same degree OpenCL can). For example you can share registers in the local compute group now in OpenGL (using something like the AMD GPUs LDS (local data share) (though this particular feature only works with OpenGL compute shaders at this time). OpenGL has stronger more performing implementations on some platforms (such as Open Source Linux drivers). OpenGL has access to more fixed function hardware (like other answers have said). While it is true that sometimes fixed function hardware can be avoided (e.g. Crytek uses a "software" implementation of a depth buffer) fixed function hardware can manage memory just fine (and usually a lot better than someone who isn't working for a GPU hardware company could) and is just vastly superior in most cases. I must admit OpenCL has pretty good fixed function texture support which is one of the major OpenGL fixed function areas.

I would argue that Intels Knights Corner is a x86 GPU that controls itself. I would also argue that OpenCL 2.0 with its texture functions (which are actually in lesser versions of OpenCL) can be used to much the same performance degree user2746401 suggested.

afree100
  • 191
  • 1
  • 8
2

The "feature" that OpenCL is designed for general-purpose computation, while OpenGL is for graphics. You can do anything in GL (it is Turing-complete) but then you are driving in a nail using the handle of the screwdriver as a hammer.

Also, OpenCL can run not just on GPUs, but also on CPUs and various dedicated accelerators.

2

OpenCL (in 2.0 version) describes heterogeneous computational environment, where every component of system can both produce & consume tasks, generated by other system components. No more CPU, GPU (etc) notions are longer needed - you have just Host & Device(s).

OpenGL, in opposite, has strict division to CPU, which is task producer & GPU, which is task consumer. That's not bad, as less flexibility ensures greater performance. OpenGL is just more narrow-scope instrument.

Roman Arzumanyan
  • 1,784
  • 10
  • 10
2

In addition to the already existing answers, OpenCL/CUDA not only fits more to the computational domain, but also doesn't abstract away the underlying hardware too much. This way you can profit from things like shared memory or coalesced memory access more directly, which would otherwise be burried in the actual implementation of the shader (which itself is nothing more than a special OpenCL/CUDA kernel, if you want).

Though to profit from such things you also need to be a bit more aware of the specific hardware your kernel will run on, but don't try to explicitly take those things into account using a shader (if even completely possible).

Once you do something more complex than simple level 1 BLAS routines, you will surely appreciate the flexibility and genericity of OpenCL/CUDA.

Christian Rau
  • 45,360
  • 10
  • 108
  • 185
  • 1
    Im not sure about 'but also doesn't abstract away the underlying hardware too much'. It seems OpenCL would in fact totally ignore parts of the hardware, for example rasterization units. – dronus Dec 07 '11 at 20:40
  • @dronus Well, yes it ignores the fixed-function parts. But on the other hand shaders abstract away the many-core nature of the hardware and such things as the different memory types and optimized memory accesses. – Christian Rau Dec 07 '11 at 20:43
  • 1
    Rasterization even enables some kind of random memory access (to "triangular connected" regions...) with a guaranteed outcome (fragments overwritten ordered by z depth). Thinking in kernels and memory streams, the emulation of such behaviour would mean random access with well defined ordered mutexes among all parallel threads or something else. What is a usable OpenCL ideom for parallel random access like this? – dronus Dec 07 '11 at 20:52
0

One thought is to write your program in both and test them with respect to your priorities.

For example: If you're processing a pipeline of images, maybe your implementation in openGL or openCL is faster than the other.

Good luck.

Adam
  • 77
  • 2
  • 12