clang compiler being able to offload OpenMP region to GPU

Question

I read that clang compiler can offload OpenMP regions to GPUs. However, I am confused on the way to compile the code with clang. The clang version that is installed in our cluster is 3.9.0 (tags/RELEASE_390/final 288133). The code I want to offload is basically a matrix-matrix multiplication:

#pragma omp target parallel for  shared(C,P,T) private(i,j,k)
  for (i=0; i<N; i++) {
    for (j=0; j<N; j++) {
      for (k=0; k<N; k++) {
        C[i][j] += P[i][k]*T[k][j];
      }
    }
  }

I am compiling with

clang -O3 -fopenmp-targets=x86_64-unknown-linux-gnu mm.c
clang-3.9: warning: argument unused during compilation: '-fopenmp-targets=x86_64-unknown-linux-gnu'

What I don't know is if my installed version of clang is being able to offload code to GPUs and if so, how could I do it. Any comment is welcome.

score 2 · Answer 1 · edited Sep 06 '19 at 22:29

Short answer: According to the Clang 3.9.0 Release Notes it is not:

Currently Clang supports OpenMP 3.1 and all non-offloading features of OpenMP 4.0/4.5. Offloading features are under development.

I couldn't find anything in the newer release notes on whether or not it has been officially added in the meantime to a newer version. There seems to be some support though. The libomptarget doc states two plugins

generic 64 bit for the following targets:
- powerpc64-ibm-linux-gnu
- powerpc64le-ibm-linux-gnu
- x86_64-pc-linux-gnu
cuda, for the target (see this):
- nvptx-nvidia-cuda
- nvptx64-nvidia-cuda

Maybe this document helps you if you want to try building your own Clang.

A list of OpenMP implementations can be found here. GCC 6.1 and newer fully supports OpenMP 4.5. Offloading seems to be support by GCC 5 and later, but for Nvidia GPUs only via OpenACC, see here.

In general, the support of the target construct, i.e. offloading, is not equal with GPU (or any specific device) support, as this requires device-specific implementations per device. The trivial support would be offloading to to the host device, e.g. x86-CPU, which wouldn't be beneficial for anything but testing or compatibility in the absence of accelerator devices.

Hope that helps.

score 2 · Answer 2 · answered Feb 13 '19 at 00:31

Since version 7, clang supports OpenMP offloading to Nvidia GPUs (http://releases.llvm.org/7.0.0/tools/clang/docs/ReleaseNotes.html#id12):

Clang gained basic support for OpenMP 4.5 offloading for NVPTX target.

To compile your program for NVPTX target use the following options: -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda for 64 bit platforms or -fopenmp -fopenmp-targets=nvptx-nvidia-cuda for 32 bit platform.

clang compiler being able to offload OpenMP region to GPU

2 Answers2