GPU Programming, CUDA or OpenCL or?

Question

What is the best way to do programming for GPU?

I know:

CUDA is very good, much developer support and very nice zo debug, but only on NVidia Hardware
OpenCL is very flexible, run on NVidia, AMD and Intel Hardware, run on Accellerators, GPU and CPU but as far as I know not supported anymore by NVidia.
Coriander (https://github.com/hughperkins/coriander) which converts CUDA to OpenCL
HIP https://github.com/ROCm-Developer-Tools/HIP is made by AMD to have a possibility to write in a way to convert to AMD and NVidia CUDA. It also can convert CUDA to HIP.

OpenCL would my prefered way, I want to be very flexible in hardware support. But if not longer supported by NVidia, it is a knockout. HIP sounds then best to me with different released files. But how will be the support of Intels soon coming hardware?

Are there any other options? Important is for me many supported hardeware, long term support, so that can be compiled in some years also and manufacture independant. Additional: Should be able to use more than obe compiler, on Linux and Windows supported.

For broad support, use a library with different backends instead of direct GPU programming (if this is possible for your requirements). CUDA is more modern and stable than OpenCL and has very good backwards compatibility. Nvidia is more focused on General Purpose GPU Programming, AMD is more focused on gaming. Most GPU programming is done on CUDA. Usually you won't get more than one compiler for GPU programming in any 'language'. — Sebastian, May 11 '22 at 16:21
If you tell use more about your requirements (e.g. what kind of algorithms: imaging, neural networks, big data, bioinformatics, physics simulation, embedded, gaming, supercomputers), we can give a more focused reply. Are your algorithms very simple to parallelize (the same basic calculations on a large set of data) or have totally different program flow or need lots of memory bandwidth or interact with complicated data structures from trees to much more complex instead of simple arrays? Do the execution threads have to interact to solve the task? — Sebastian, May 11 '22 at 16:21
Do you want to have flexibility, where it runs, just to be future-proof to switch the underlying hardware, or because your product needs to run on multiple hardware installed at customers now (which hardware specifically, including which GPU generations)? Is implementing the algorithm in a different way for different platforms an option? Or is it a research project? — Sebastian, May 11 '22 at 16:23
Currently it is a research project but perhaps will be commercial. So it should be flexible in hardware. There is an other option only for Microsoft Compiler, that's what I don't want, too. Why it is important what kind of algorithm? I think CUDA or OpenCL supports any kind of algorithm. A very specialized library is always a problem if you need somethind additional. — Matthias F., May 11 '22 at 17:38
It would be a bit surprising if Nvidia would stop the support of OpenCL and I found nothing that said Nvidia could do that. For the alternative, there is OpenMP if need to make simple operations and performance is not critical. Otherwise there is OpenACC which is a bit more mature and efficient. Recently, SysCL has been created. I do know much about it but it looks like a higher-level OpenCL. There are multiple available implementations. One of them is DPC++ which is made by Intel claiming it is cross-architecture (supports OpenCL and CUDA as a backend). See https://khronos.org/sycl — Jérôme Richard, May 11 '22 at 17:47
While OpenCL and CUDA are not really restrictive, higher-level framework that are meant to be portable could. And even when they support most of the features provided by CUDA/OpenCL, not all features are as efficient. CUDA enable you to write low-level very efficient code and has a very good support in term of libraries (see NPP, CuBlas, CuSparse, CuTLASS, CuDNN, etc.) but it is (almost) only for Nvidia card. That being said, if your operations are basic, there is no need for all of that and this is not a problem not to have them. Being low-level is not great either in such a case. — Jérôme Richard, May 11 '22 at 17:57
For some algorithms, the language is not so important, as it is more or less the same with each. For some the languages can be a limiting factor. OpenCL support for Nvidia _and_ for AMD was always worse than Cuda support for Nvidia in regards to compatibility, supported features, user numbers. — Sebastian, May 11 '22 at 17:59
There is also "Vulkan Kompute" by Khronos. No idea how it compares to OpenCL. — paleonix, May 11 '22 at 18:54
And similar to SYCL (i.e. high level, hardware agnostic) there is also Kokkos. — paleonix, May 11 '22 at 18:56

score 9 · Accepted Answer · answered May 11 '22 at 18:00

Nvidia won't cancel OpenCL support anytime soon.

A newly emerging approach for portable code on GPU is SYCL. It enables higher level programming from a single source file that is then compiled twice, once for the CPU and once for GPU. The GPU part then runs on GPU via either OpenCL, CUDA or some other backend.

As of right now however, the best supported GPU framework across plattforms is OpenCL 1.2, which is very well established at this point. With that your code runs on 10 year old GPUs, on the latest and fastest data-center GPUs, on gaming and workstation GPUs and even on CPUs if you need more memory. On Nvidia GPUs there is no performance/efficiency tradeoff at all compared to CUDA; it runs just as fast.

The porting tools like HIP are great if you already have a large code base, but performance could possibly suffer. My advice is to go for either one framework and stay fully committed to it, rather than using some tool to then generate a possibly poorly optimized port.

If you choose to start with OpenCL, have a look at this OpenCL-Wrapper. The native OpenCL C++ bindings are a bit cumbersome to use, and this lightweight wrapper simplifies learning a lot, while keeping functionality and full performance.

Good that I have asked ... SYCL 2020 I didn't know and looks very good. Might be that this will be the standard in teh next years. i have searched ... but didn't find anymore: fine that OpenCL still supported by NVidia. Thanks a lot. Now I'm looking for using SYCL in Visual Studio. — Matthias F., May 12 '22 at 12:38

GPU Programming, CUDA or OpenCL or?

1 Answers1