Current practice for deploying cross-GPU-family GPU code

Question

The situation is thus; a set of calculations (written in C++) that would be best done making use of whatever GPU is available on the user's system, or their CPU if no such GPU exists, where that user system is not the build machine and will be of an unknown (but pretty standard desktop) configuration.

Currently, I can write code using Thrust but (unless I misunderstand) at the point of actually building it, the target is set (broadly speaking, Nvidia or just CPU) and the binary will only make use of GPU hardware on the user's machine (possible won't even run otherwise) if it's from the same GPU family as the binary was built for.

What I would like, in a magical ideal world, is a binary that will identify what (if any) GPU (Nvidia, ATI, fallback to just plain CPU) family is available on the machine its running on, and make use of it. Having to build three separate versions and make sure each user gets the right one for their particular machine is a non-starter (the targets are pretty standard desktops; Windows, Linux and Solaris - let's put that aside though, as having a different build for each of those three is completely acceptable; the aim is to have a binary for each of those three targets that identifies and uses the available GPU by itself, whether it's Nvidia, ATI or just plain CPU).

I have been typing some hopeful terms into the google but not found anything addressing this yet; for all I know it's a completely solved problem and I've just not typed the right words.

Can anyone tell me what (if any) the standard way to do this is?

Edits: removed bad thrust information, added note about ultimate target hardware

There is no such thing as thrust for ATI. It is a CUDA only template library. On NVIDIA platforms, you can build for as many CUDA architectures as you want and GPU architecture selection is automagic, you don't need to do anything — talonmies, Jul 19 '15 at 10:41
this, the questions that nvidia-nsight asks is for projects that use cuda functions directly — Behrooz, Jul 19 '15 at 10:49
This is really a question about recommending an off-site resource (a GPU programming language/environment). — Puppy, Jul 19 '15 at 11:41
"This is really a question about recommending an off-site resource" Gosh, is it? I thought it was a question about what the current best practice is. I can find heaps of off-site resources about GPU programming by myself. What I can't find is how other people are managing the need to target multiple end-user systems. — Moschops, Jul 19 '15 at 13:13

Basile Starynkevitch · Answer 1 · 2015-07-19T16:36:37.320

AFAIK, the only standard working on both AMD/ATI & Nvidia GPGPUs is OpenCL. CUDA is an Nvidia proprietary technology. Of course it requires an OpenCL implementation to be available (and worthwhile using).

You might be interested in PIPS4U, OpenACC, OpenMP, or even MPI etc. Look also into multi threading, e.g. with C++11 thread support.

Notice that some computers (e.g. many servers) don't have any GPGPUs. Others may have a GPGPU which is slower than their CPU, so is not worth using for computing tasks. Some AMD chips have an APU. Others have HSA. Hetereogenous computing thru GPGPUs is not well standardized.

Be aware that GPGPU software has to be tuned (or configured) to the particular hardware it is running on.... This is why it is difficult to code efficient OpenCL (or Cuda) software.

^{BTW, there might be some weird configuration. Imagine a system with processor with an on-chip GPU (e.g. Intel i4770K, Intel i5775C), one high-end AMD GPGPU & graphics card, and another high-end NVIDIA GPGPU & graphics card.... You might want to run OpenCL on all three of them...}

You could consider some plugin architecture (e.g. using dlopen & dlsym from POSIX). You could consider some C++ framework (Qt, POCO) providing interfaces to plugins in an OS independent way.

But there is no silver bullet ; perhaps a good approach might be to define and publicly document a plugin architecture (e.g. plugin naming and calling conventions), then publish, as free software, several implementations (above OpenCL, Cuda, OpenACC, OpenMP, MPI, ....) of similar plugins fitting into that architecture. Be aware of name mangling so declare as extern "C" the public functions of your plugins. You might mix that with some metaprogramming approach, your application generating C++ (and/or OpenCL, etc...) code at runtime then compiling and loading it as a plugin (on the user's machine).

Sure, but does OpenCL set the target hardware at compile-time or run-time? From what I read, I think it might do it at run-time, with a system specific set of OpenCL libraries that the user will have to have installed. — Moschops, Jul 19 '15 at 10:47
It's hardware agnostic. Works as long as the driver supports OpenCL — Behrooz, Jul 19 '15 at 10:48
It is not an answer. OpenCL will not work if the machine has no OpenCL Implementation installed, which is normal for "pretty standard" configuration. — stgatilov, Jul 19 '15 at 10:48
@Moschops OpenCL doesn't set anything, you get a list of devices and send commands to whichever you like. — RamblingMad, Jul 19 '15 at 10:48
@stgatilov It's certainly true that the ideal would be for a "pretty standard" configuration, but in absence of a better answer, thus far I'll take "OpenCL but you have to distribute some OpenCL libraries that the users will need to install on their specific machine" if it's the best we can do today. — Moschops, Jul 19 '15 at 10:51
@stgatilov it's no different to depending on a third-party library. — RamblingMad, Jul 19 '15 at 10:51
AFAIK nvidia, intel and amd all ship opencl libraries with their graphics drivers. on linux you can install them separately. — Behrooz, Jul 19 '15 at 10:52

score 3 · Answer 2 · edited May 23 '17 at 12:06

3

I think this is worth a try: Load shared library by path at runtime
For it to work you should separate your application into frontend and backend parts. compile the backend for each family and have the frontend load the proper backend at runtime.

edited May 23 '17 at 12:06

Community

1
1

answered Jul 19 '15 at 10:46

Behrooz

1,696
2
32
54

Current practice for deploying cross-GPU-family GPU code

2 Answers2