I am trying to use OpenMP to offload to an AMD GPU, I have read in the OpenMP 4.5 specification that target device represents the device onto which code and data may be offloaded, but I cannot tell if the offloading has been successful, or if it has indeed been offloaded to my AMD GPU.
To test if the offloading indeed works, I've tried to compute the time with and without the pragmas, and check the difference using the wall time, but the time returned in both cases is 0:
This is the simple code used for the test, I'll try to use it in my project:
int n = 10240; float a = 2.0f; float b = 3.0f;
float *x = (float*) malloc(n * sizeof(float));
float *y = (float*) malloc(n * sizeof(float));
double start = omp_get_wtime();
#pragma omp target data map(to:x)
{
#pragma omp target map(tofrom:y)
#pragma omp teams
#pragma omp distribute parallel for
for (int i = 0; i < n; ++i){
y[i] = a*x[i] + y[i];
}
#pragma omp target map(tofrom:y)
for (int i = 0; i < n; ++i){
y[i] = b*x[i] + y[i];
}
}
std::cout << "Time: " << (omp_get_wtime() - start) * 1000.0 << " ms" <<std::endl;
free(x); free(y); return 0;
}
NB: I am using gcc 5.1.0 in windows
Any help would be much appreciated.