Can I use `omp_get_thread_num()` on the GPU?

Question

I have OpenMP code which works on the CPU by having each thread manage memory addressed by the thread's id number, accessible via omp_get_thread_num(). This works well on the CPU, but can it work on the GPU?

A MWE is:

#include <iostream>
#include <omp.h>

int main(){
  const int SIZE = 400000;

  int *m;
  m = new int[SIZE];

  #pragma omp target
  {
    #pragma omp parallel for
    for(int i=0;i<SIZE;i++)
      m[i] = omp_get_thread_num();
  }

  for(int i=0;i<SIZE;i++)
    std::cout<<m[i]<<"\n";
}

Z boson · Answer 1 · 2018-03-14T12:32:39.950

It works fine on the GPU for me with GCC. You need to map m thoough e.g. like this

#pragma omp target map(tofrom:m[0:SIZE])

I compiled like this

g++ -O3 -Wall -fopenmp -fno-stack-protector so.cpp

You can see an example for system without offloading here

http://coliru.stacked-crooked.com/a/1e756410d6e2db61

A method I use to find out the number of teams and threads before doing work is this:

#pragma omp target teams defaultmap(tofrom:scalar)
{
    nteams = omp_get_num_teams();
    #pragma omp parallel
    #pragma omp single
    nthreads = omp_get_num_threads();
}

On my system with GCC 7.2, Ubuntu 17.10, and gcc-offload-nvptx with a GTX 1060 I get nteams = 30 and nthreads = 8. See this answer where I do a custom reduction for a target region using threads and teams. With -offload=disable nteams = 1 and nthreads = 8 (4 core/8 hardware thread CPU).

I added -fopt-info to the compile options and I get only the message

note: basic block vectorized

score 1 · Answer 2 · answered Dec 23 '17 at 19:31

The answer seems to be no.

Compiling with PGI using:

pgc++ -fast -mp -ta=tesla,pinned,cc60 -Minfo=all test2.cpp

gives:

13, Parallel region activated
    Parallel loop activated with static block schedule
    Loop not vectorized/parallelized: contains call
14, Parallel region terminated

whereas compiling with GCC using

g++ -O3 test2.cpp -fopenmp -fopt-info

gives

test2.cpp:17: note: not vectorized: loop contains function calls or data references that cannot be analyzed
test2.cpp:17: note: bad data references.

Can I use `omp_get_thread_num()` on the GPU?

2 Answers2