13

When compiling the code below using nvcc (CUDA 5.0), the error "illegal combination of memory qualifiers" appears, as it apparently is impossible to have global kernels in a class.

class A
{
public:
    __global__ static void kernel();
};

__global__ void A::kernel()
{}

I can understand this restriction when dealing with non-static members, but why does the error still occur when the kernel is declared static? The calling of such members would be no different from calling the function when it is declared in a namespace (A in this case).

A::kernel <<< 1, 1 >>> ();

Is there a reason I'm missing as to why this hasn't been implemented (yet)?

EDIT: Based on the responses in both the answers and comments, I haven't been clear enough on my question. My question is not why an error appears. Obviously, this is because it hasn't been implemented. My question is why it hasn't been implemented. So far, I haven't been able to think of a reason that keeps this feature from being implemented. I realize that I might have forgotten about a special case which would complicate matters, hence the question.

The reasons I believe this to be a reasonable feature are:

  • A static function doesn't have a this pointer So even if the kernel is called on an object that lives on the host, there is no conflict in accessing its data, as this data is inaccessible in the first place (data from what object??).
  • You could argue that if the class has static data associated with it, living on the host, this should in principle be accessible from the static kernel. However, static data isn't supported either, so again no conflict.
  • Calling a static kernel on an object on the host (A a; a.staticKernel<<<...,...>>>();) would be entirely equivalent to calling it without the object at all (A::staticKernel<<<...,...>>>();), as we are used to in regular C++.

What am I missing?

JorenHeit
  • 3,877
  • 2
  • 22
  • 28
  • is your aim to have a class with _ _global_ _ function? – 4pie0 Aug 31 '13 at 19:20
  • "My question is why it hasn't been implemented. So far, I haven't been able to think of a reason that keeps this feature from being implemented." - there are milion things that might be implemented but there is no such need and they are left unimplemented – 4pie0 Aug 31 '13 at 22:19
  • So your answer is: "It just isn't."? – JorenHeit Aug 31 '13 at 22:19
  • 1
    So the real purpose of this question is to have a debate about why the CUDA object model is the way it is? That isn't much of an [SO] question. Vote to close as primarily opinion based. – talonmies Sep 01 '13 at 05:17
  • 3
    I want to file a feature request with NVidia, because it seemed weird to me that this isn't supported. Before I do, I want to make sure there is no obvious reason that it is the way it is. Apparently there's not, and apparently that's a reason to downvote me. It has nothing to do with opinion, or debate. – JorenHeit Sep 01 '13 at 05:30
  • 1
    None of us here designed the CUDA object model. Therefore none of us can say why it isn't supported and any answers are, as a result, speculative at best. If I were *to guess* I would say it is because it breaks the compilation model - a \_\_global\_\_ function simultaneously compiles in *both* a host object and a device object. CUDA classes and structures can only ever be instantiated in one memory space. That would seem to preclude having \_\_constant\_\_ or \_\_global\_\_ objects inside structs and classes – talonmies Sep 01 '13 at 12:26
  • 2
    @talonmies: I could not have known in advance that there is no apparent answer to my question, so I could not have foreseen speculation. The downvotes and negative tone in the answers/comments are very unnecessary if you ask me. As to your *guess*: nothing is added to the class/struct (in memory) when you declare a static function inside of it. The declaration merely adds scope to the function, similar to a namespace. I will contact NVidia to see what they have to say. – JorenHeit Sep 02 '13 at 11:51

1 Answers1

2

Fortunately, about 4 years after this question has been asked, clang 4.0 can compile the CUDA language. Consider this example:

class A
{
public:
    __global__ static void kernel();
};

__device__ void A::kernel()
{}

int main()
{
    A::kernel <<< 1, 1 >>> ();
};

When I try to compile it with clang 4.0, I get the following error:

test.cu:7:1: error: kernel function 'kernel' must be a free function or static member function
__global__ void A::kernel()
^
/usr/local/cuda/include/host_defines.h:191:9: note: expanded from macro '__global__'
        __location__(global)
        ^
/usr/local/cuda/include/host_defines.h:88:9: note: expanded from macro '__location__'
        __annotate__(a)
        ^
/usr/local/cuda/include/host_defines.h:86:9: note: expanded from macro '__annotate__'
        __attribute__((a))
        ^
test.cu:7:20: error: __host__ function 'kernel' cannot overload __global__ function 'kernel'
__global__ void A::kernel()
                   ^
test.cu:4:28: note: previous declaration is here
    __global__ static void kernel();
                           ^
2 errors generated.

To satisfy these errors, I've inlined the kernel definition into the class declaration:

class A
{
public:
    __global__ static void kernel()
    {
        // implementation would go here
    }
};

Then clang 4.0 compiles it successfully and it can be executed without any errors. So this is clearly not a limitation of the CUDA language, but its de facto standard compiler. By the way, nvcc has many similar unjustified limitations which clang does not have.

Jakub Klinkovský
  • 1,248
  • 1
  • 12
  • 33
  • 1
    @talonmies The section [E. C/C++ Language Support](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#c-cplusplus-language-support) of the programming guide is merely a (partial) specification of what `nvcc` does or does not support. It has nothing to do with the CUDA language specification, which unfortunately does not exist. The [B. C Language Extensions](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#c-language-extensions) section is close, but it is mixed with `nvcc` specifics and the CUDA Runtime API. – Jakub Klinkovský May 14 '17 at 21:28
  • 1
    Currently (CUDA 10.2), this still doesn't work if the kernel is templated, e.g. ```template __global__ static void kernel(T something){}```. Getting ```warning: inline qualifier ignored for "__global__" function``` without the inline qualifier being there, and then ```error: illegal combination of memory qualifiers``` on the same line with the function declaration. – Greg Kramida Jun 07 '20 at 23:12
  • Can confirm that CUDA 11.1 still has this flaw in nvcc. Same error message and everything. Now have to come up with a whole new design. – Meta.x.gdb Mar 12 '21 at 00:19