I am trying to understand/test OpenMP with GPU offload. However, I am confused because some examples/info (1, 2, 3) in the internet are analogous or similar to mine but my example does not work as I think it should. I am using g++ 9.4
on Ubuntu 20.04 LTS
and also installed gcc-9-offload-nvptx
.
My example that does not work but is similar to this one:
#include <iostream>
#include <vector>
int main(int argc, char *argv[]) {
typedef double myfloat;
if (argc != 2) exit(1);
size_t size = atoi(argv[1]);
printf("Size: %zu\n", size);
std::vector<myfloat> data_1(size, 2);
myfloat *data1_ptr = data_1.data();
myfloat sum = -1;
#pragma omp target map(tofrom:sum) map(from: data1_ptr[0:size])
#pragma omp teams distribute parallel for simd reduction(+:sum) collapse(2)
for (size_t i = 0; i < size; ++i) {
for (size_t j = 0; j < size; ++j) {
myfloat term1 = data1_ptr[i] * i;
sum += term1 / (1 + term1 * term1 * term1);
}
}
printf("sum: %.2f\n", sum);
return 0;
}
When I compile it with: g++ main.cpp -o test -fopenmp -fcf-protection=none -fno-stack-protector
I get the following
stack_example.cpp: In function ‘main._omp_fn.0.hsa.0’:
cc1plus: warning: could not emit HSAIL for the function [-Whsa]
cc1plus: note: support for HSA does not implement non-gridified OpenMP parallel constructs.
It does compile but when using it with
./test 10000
the printed sum is still -1. I think the sum value passed to the GPU was not returned properly but I explicitly map it, so shouldn't it be returned? Or what am I doing wrong?
EDIT 1
I was ask to modify my code because there was a historically grown redundant for loop and also sum was initialized with -1. I fixed that and also compiled it with gcc-11 which did not throw a warning or note as did gcc-9. However the behavior is similar:
Size: 100
Number of devices: 2
sum: 0.00
I checked with nvtop
, the GPU is used. Because there are two GPUs I can even switch the device and can be seen by nvtop
.
Solution:
The fix is very easy and stupid. Changing
map(from: data1_ptr[0:size])
to
map(tofrom: data1_ptr[0:size])
did the trick.
Even though I am not writing to the array this seemed to be the problem.