0

I'd like to run a parallel for loop to initialize a 2 dimensional buffer with random values. But I get an EXC_BAD_ACCESS (code=EXC_I386_GPFLT) exception in the first line of the kernel.

This is the Code (the Pixel struct is from the PixelGameEngine Library):

namespace olc
{
    struct Pixel
    {
        union
        {
            uint32_t n = nDefaultPixel;
            struct
            {
                uint8_t r;
                uint8_t g;
                uint8_t b;
                uint8_t a;
            };
        };

        enum Mode
        {
            NORMAL, MASK, ALPHA, CUSTOM
        };

        Pixel()
        {
            r = 0;
            g = 0;
            b = 0;
            a = nDefaultAlpha;
        }

        Pixel(uint8_t red, uint8_t green, uint8_t blue, uint8_t alpha = nDefaultAlpha)
        {
            n = red | (green << 8) | (blue << 16) | (alpha << 24);
        }

        Pixel(uint32_t p) { n = p; }
    };
}

 int main()
{
    auto* screenBuffer = new olc::Pixel[256 * 240];

    cl::sycl::queue queue;
    {
        cl::sycl::buffer<olc::Pixel, 2> b_screenBuffer(screenBuffer,
                                                       cl::sycl::range<2>(256, 240));

        queue.submit([&](cl::sycl::handler& cgh) {
            auto screenBuffer_acc = b_screenBuffer.get_access<cl::sycl::access::mode::write>(cgh);

            cgh.parallel_for(cl::sycl::range<2>(256, 240), [&](cl::sycl::id<2> index) {
                screenBuffer_acc[index] = olc::Pixel(rand() % 256, //Here I get the exception 
                                                     rand() % 256,
                                                     rand() % 256);
            });
        });
    }

    return 0;
}

I'm using triSYCL 0.1.0 on Mac OS Catalina with boost 1.74 and C++20. I've tried to change the data type of the buffer from old::Pixel to float and to make the buffer 1 dimensional but I always got the same exception. Does anybody have an idea what I could try?

Gian Laager
  • 474
  • 4
  • 14

2 Answers2

3

Your code is fine apart from one small issue:

cgh.parallel_for(cl::sycl::range<2>(256, 240), [&](cl::sycl::id<2> index) {...}

Here you are capturing variables by reference in your kernel. Don't ever do this. If you are targeting an accelerator (e.g. GPU), the captured variables will reference host memory and crash when accessed on device. If you only target CPU, it's still undefined behavior since the kernel is executed asynchronously and the captured variables might not exist anymore when the kernel is run.

Therefore, always capture by value in kernel lambdas:

cgh.parallel_for(cl::sycl::range<2>(256, 240), [=](cl::sycl::id<2> index) {...}

In the lambda for command group scope queue.submit([&](sycl::handler& cgh){...}) it is always fine to capture by reference since the command group scope is always evaluated on the host synchronously.

I don't have a triSYCL install at hand, but after this small modification your code ran fine with hipSYCL (which also runs on Mac by the way, in case you'd like to double-check your code with two implementations which can be helpful sometimes). If changing the lambda capture doesn't solve your problem you might want to open an issue on triSYCL's github repository.

illuhad
  • 506
  • 2
  • 5
  • thanks @illuhad. So the accessor actually is like a pointer that you can copy as value and then still modify the location that you want, I fought that you would have to take it by reference to modify the right memory. – Gian Laager Sep 30 '20 at 15:18
  • That's right. To be more precise, On the host (i.e. outside of your parallel_for) it provides mainly a description of the requested memory access, since the actual memory might not even exist yet because a SYCL implementation is free to do lazy memory allocation. This description is used by the SYCL runtime e.g. as additional information for scheduling. Inside the kernel, it also provides a view on the requested data. – illuhad Sep 30 '20 at 23:03
0

I agree with @illuhad.

I have tried with triSYCL and added on the top some imaginative

#include "CL/sycl.hpp"
constexpr auto nDefaultPixel = 2;
constexpr auto nDefaultAlpha = 42;

to get it compiling (please post some complete code sample) and it works.

The reason why you have to use [=] instead of [&] is because SYCL targets also architectures where you do not have host memory directly accessible from the accelerator.