5

I am trying to use OpenMP to offload to an AMD GPU, I have read in the OpenMP 4.5 specification that target device represents the device onto which code and data may be offloaded, but I cannot tell if the offloading has been successful, or if it has indeed been offloaded to my AMD GPU.

To test if the offloading indeed works, I've tried to compute the time with and without the pragmas, and check the difference using the wall time, but the time returned in both cases is 0:

This is the simple code used for the test, I'll try to use it in my project:

int n = 10240; float a = 2.0f; float b = 3.0f;
float *x = (float*) malloc(n * sizeof(float));
float *y = (float*) malloc(n * sizeof(float));
double start = omp_get_wtime();
#pragma omp target data map(to:x)
{
#pragma omp target map(tofrom:y)
#pragma omp teams
#pragma omp distribute parallel for
for (int i = 0; i < n; ++i){
    y[i] = a*x[i] + y[i];
}
#pragma omp target map(tofrom:y)
for (int i = 0; i < n; ++i){
    y[i] = b*x[i] + y[i];
}

}
std::cout << "Time: " << (omp_get_wtime() - start) * 1000.0 << " ms" <<std::endl;
free(x); free(y); return 0;
}

NB: I am using gcc 5.1.0 in windows

Any help would be much appreciated.

Ilya Verbin
  • 647
  • 5
  • 20
Hamza
  • 61
  • 9
  • GCC has a limited support for offloading, find out more [here](https://gcc.gnu.org/wiki/Offloading) – Piotr Skotnicki May 25 '16 at 20:30
  • If the time is `0`, there's something seriously wrong. The compiler could just optimize away all code, because you do nothing with the result, but even then I would not expect `0` as time, because `omp_get_wtime` actually takes some time on it's own. – Zulan May 25 '16 at 21:08
  • GCC only supports Intel MIC as OpenMP offload target. Check out [that](http://stackoverflow.com/questions/29297814/openmp-4-0-in-gcc-offload-to-nvidia-gpu) question. – Hristo Iliev May 26 '16 at 08:26
  • 2
    @HristoIliev, GCC 6 supports HSA as offload target: https://gcc.gnu.org/gcc-6/changes.html#hsa – Ilya Verbin May 26 '16 at 14:51
  • 2
    Try to set `GOMP_DEBUG=1` and see if this provides some insight. – Zulan May 26 '16 at 16:11
  • 1
    @Zulan I've searched how to use `GOMP_DEBUG=1` but didn't found how, can you please specify how to use it? Thanks. – Hamza May 26 '16 at 17:02
  • 1
    @Hamza, 1. Verify that your GPU supports HSA. 2. Install GCC 6.1 with special configure options (see link in my previous comment). 3. (Probably) switch to Linux. – Ilya Verbin May 26 '16 at 22:51
  • @IlyaVerbin, interesting. Do you happen to know what the state of that HSA support is? I'm not sure if "GCC can now generate HSAIL for _simple OpenMP device constructs_" means that it supports the full range of OpenMP 4 device constructs when not combined or it means that the support is still quite limited. – Hristo Iliev May 27 '16 at 09:46
  • @HristoIliev, AFAIK all OpenMP 4 device constructs are supported, but not all OpenMP constructs inside target regions are supported, because it's not possible to (efficiently) execute them on GPU. You can ask Martin Jambor for details, he knows everything about HSA in GCC :) https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00519.html – Ilya Verbin May 27 '16 at 11:05
  • @IlyaVerbin I have an AMD Radeon HD 7400M Series, and according to [link](https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture#Software_support) AMD Radeon in general supports HSA, but couldn't find anything in AMD's website, please correct if I'm wrong, Thanks a lot. – Hamza May 27 '16 at 12:46

0 Answers0