0

So Right now I'm trying to pass a pointer of a pointer into an OpenCL 2.1 Kernel using Shared Virtual Memory and from what I've read this should be possible however when I attempt to build the kernel i get the following error:

kernel parameter cannot be declared as a pointer to a pointer

        __kernel void MyKernel(__global float** doubleP) {
                                                 ^
    1 error generated.
    
    error: Clang front-end compilation failed!
    Frontend phase failed compilation.
    Error: Compiling CL to IR

So what is the correct way to pass a pointer to a pointer as a kernel argument

Ethan
  • 7
  • 5
  • Why do you need to do this? There's a reason it's impossible - on the OpenCL side, pointers are in the device's address space, while on the host side they are in the host's address space. There's no logical way to equate them. – Mack Aug 07 '21 at 14:12
  • @Mack I'm trying to write a back-propagation algorithm, which means I'm having to deal with some fairly complex superscripts meaning so I want to be able to write something like P[ i ][ j ][ k ] instead of P[ (i * n1)+(j * n2) + k], furthermore I cant use arrays because they cant store enough data. – Ethan Aug 08 '21 at 04:17
  • Using indirect arrays like this uses up *more* memory than a flat array, because each level of indirection requires memory for all the pointers (and it is slower, if that matters). Have you tried using a flat array? You can easily abstract the index calculation as a macro. There is nothing that will let you dereference a pointer in host memory on the device. – Mack Aug 08 '21 at 05:33

1 Answers1

0

Maybe your compiler defaults to OpenCL 1.2, where this is not allowed. From the OpenCL 1.2 restrictions: "Arguments to __kernel functions in a program cannot be declared as a pointer to a pointer(s)".

In OpenCL 2.1, it should theoretically be possible. However, it is still better to use a linearized 1D array as this allows for much faster coalesced memory access. To linearize a 2D index, you can use:

uint __attribute__((always_inline)) index(const uint x, const uint y) {
    return n = x+y*size_x;
}
uint2 __attribute__((always_inline)) position(const uint n) {
    uint2 r;
    r.x = n%size_x;
    r.y = n/size_x;
}
// to access the 1D linearized array, use "float value = array[index(x, y)];"

You can arbitrarily extend this to higher dimensions. It's best to move the index conversion into a small inlined function in OpenCL C. Array size should not be a problem, you can allocate 1/4 of the total video memory on any device for a single array, and most GPUs nowadays support array sizes of the full available video memory.

ProjectPhysX
  • 4,535
  • 2
  • 14
  • 34