2

I am experimenting with OpenMP and the target directives. I am using omp_target_alloc() to allocate a buffer on the device directly, and then I try to write to this buffer inside a target region. Unfortunately, when I try to do so, I get a segmentation fault. Interestingly, if I use directives instead of the function omp_target_alloc(), the program does not crash. And to make things even more interesting, both omp_get_num_devices() and omp_get_default_device() return 0. Here is my code:

#include <iostream>
#include <omp.h>

//#define USE_TARGET // Uncomment this to see the segmentation fault

int main(int argc, char** argv)
{
    const unsigned long int N = 100000;

    std::cout << "Number of devices: " << omp_get_num_devices() << std::endl;
    std::cout << "Default device: " << omp_get_default_device() << std::endl;

    // Allocate
#ifdef USE_TARGET
    float* buffer = (float*)omp_target_alloc(N*sizeof(float), 0);
#else
    float* buffer = (float*)malloc(N*sizeof(float));
#endif

    // Evaluate
#ifdef USE_TARGET
    #pragma omp target is_device_ptr(buffer)
    {
#else
    #pragma omp target data map(tofrom:buffer[0:N])
    {
#endif
        #pragma omp parallel for
        for(unsigned long int i = 0; i < N; ++i)
            buffer[i] = i;
    }

    // Cleanup
#ifdef USE_TARGET
    omp_target_free(buffer, 0);
#else
    free(buffer);
#endif

    return 0;
}

Can someone explain to me why the above code produces a segmentation fault when I define USE_TARGET? What do I need to do to fix this code?

I use "device 0" in my call to omp_target_alloc(). I assume that device 0 is the CPU itself. Right? I know that the target directives and the omp_target_alloc() are a bit pointless, but my aim is to write a code that runs both on accelerators and CPUs.

Also, g++ --version gives me this: g++ (Debian 7.3.0-5) 7.3.0

AstrOne
  • 3,569
  • 7
  • 32
  • 54
  • I doubt you can just dereference a device pointer. – Vladimir F Героям слава Mar 02 '18 at 09:52
  • I get a bunch of errors compiling your code such as `mkoffload: fatal error: x86_64-linux-gnu-accel-nvptx-none-gcc-7`. You must be using a version of GCC with GPU offloading support (it's not in the default version of GCC on Ubuntu 17.10). – Z boson Mar 02 '18 at 11:30
  • I got it to compile with `g++ -Wall omp.cpp -fopenmp -foffload=nvptx-none -fno-stack-protector`. It does not crash for me. – Z boson Mar 04 '18 at 15:06
  • Actually `g++ omp.cpp -Wall -fopenmp -fno-stack-protector` is sufficient. I checked that it was going to the GPU with sudo nvprof ./a.out. But `Number of devices: 1` so I'm a bit confused still. – Z boson Mar 05 '18 at 11:38

1 Answers1

1

The problem is that according the the OpenMP 4.5 specification (see section 3.5.1) omp_target_alloc requires that

device_num, which must be greater than or equal to zero and less than the result of omp_get_num_devices()

In your case your installation of GCC does not see any non-host devices as omp_get_num_devices() returns zero. This causes omp_target_alloc to return NULL. That's why it crashes http://coliru.stacked-crooked.com/a/00054f08e8e1322e

On my system omp_get_num_devices returns 1 and your code does not crash.


I just learned I can get your code to crash on my system by using -foffload=disable

g++ -Wall -O3 -fopenmp -foffload=disable -fno-stack-protector foo.cpp

This disables the offloading and causes omp_target_alloc to return NULL.

You can fix your code using the following method

bool offload;
#pragma omp target defaultmap(tofrom:scalar)
offload = !omp_is_initial_device();
int device = offload ? 0: omp_get_initial_device();

This sets device to the host device if there is no non-host device otherwise it sets the device to 0. I got this idea here. Then use device instead of 0 for device parameters like this:

#include <iostream>
#include <omp.h>

int main(int argc, char** argv)
{
    const unsigned long int N = 100000;

    std::cout << "Number of devices: " << omp_get_num_devices() << std::endl;
    std::cout << "Default device: " << omp_get_default_device() << std::endl;

    bool offload;
    #pragma omp target defaultmap(tofrom:scalar)
    offload = !omp_is_initial_device();
    int device = offload ? 0: omp_get_initial_device();

    if (offload) {
      printf("Able to use offloading!\n");
    }

    // Allocate
    float* buffer = (float*)omp_target_alloc(N*sizeof(float), device);

    // Evaluate
    #pragma omp target is_device_ptr(buffer)
    #pragma omp parallel for
    for(unsigned long int i = 0; i < N; ++i) buffer[i] = i;

    // Cleanup
    omp_target_free(buffer, device);

    return 0;
}

http://coliru.stacked-crooked.com/a/5104446f789e2fc9

Z boson
  • 32,619
  • 11
  • 123
  • 226