I read that clang compiler can offload OpenMP regions to GPUs. However, I am confused on the way to compile the code with clang. The clang version that is installed in our cluster is 3.9.0 (tags/RELEASE_390/final 288133). The code I want to offload is basically a matrix-matrix multiplication:
#pragma omp target parallel for shared(C,P,T) private(i,j,k)
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
for (k=0; k<N; k++) {
C[i][j] += P[i][k]*T[k][j];
}
}
}
I am compiling with
clang -O3 -fopenmp-targets=x86_64-unknown-linux-gnu mm.c
clang-3.9: warning: argument unused during compilation: '-fopenmp-targets=x86_64-unknown-linux-gnu'
What I don't know is if my installed version of clang is being able to offload code to GPUs and if so, how could I do it. Any comment is welcome.