I am trying to do some OpenMP offloading to the GPU on my local machine which is employed with a GTX 1060 graphic card. All of my CUDA and Cublas examples run just fine. However, when I tried to run some OpenMP offloading it simply does not work. In order to have OpenMP 5.0 support, I compiled GCC 10.2.0 toolchain. After some debugging, I found that the OpenMP runtime does not see any devices. E.g. this code displays zero:
#include <omp.h>
#include <stdio.h>
int main() {
printf("%d\n", omp_get_num_devices());
return 0;
}
However, the Nvidia toolchain is up and running:
$ nvidia-smi
Sun Feb 21 23:06:40 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:1D:00.0 Off | N/A |
| 0% 37C P8 12W / 200W | 584MiB / 6075MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
So what am I missing? How can be the devices found by OpenMP runtime?
EDIT:
I am appending the information about my compiler:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/gcc/10.2.0/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ./configure --prefix=/opt/gcc/10.2.0/
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC)
The code was compiled with the following command:
gcc -fopenmp simple.c