CudaMemcpyToSymbol fails when called from constructor

Question

I could not find anything in the CUDA documentation to explain why cudaMemcpyToSymbol fails (with cudaErrorInvalidSymbol) in the following

__constant__ float dev[2];

struct Struct
{
    void construct()
    {
        float host[2] = {1, 2};
        cudaError_t error = cudaMemcpyToSymbol(dev, host, sizeof(host));
        printf(cudaGetErrorString(error));
    }
};

class Class
{
public:
    Class()
    {
        s.construct();
    }

private:
    Struct s;
};

static Class instance;

int main()
{
}

while it works when construct() is called from a method:

class Class
{
public:
    void foo()
    {
        s.construct();
    }

private:
    Struct s;
};

static Class instance;

int main()
{
    instance.foo();
}

Although @talonmies appears to state otherwise in the answer given [here](http://stackoverflow.com/questions/24869167/cuda-calling-kernel-outside-main-function) I believe the underlying limitation is the same. The CUDA runtime has a lazy initialization model, and the usage of a kernel in the constructor called before main as well as (I believe) usage of a static `__constant__` or `__device__` symbol is also not allowed, before main. Perhaps @talonmies will update the answer there if he agrees with me. — Robert Crovella, Aug 03 '15 at 19:28
Specifically, he states "This problem is completely limited to user kernels loaded at runtime via the runtime API." but I suspect it may also include usage of `__constant__` or `__device__` symbols. If he is in agreement, I would suggest that this question be marked as a duplicate of that one. — Robert Crovella, Aug 03 '15 at 19:31
Yes this looks to be the same thing. I think this behaviour probably changed around CUDA 4.0. It used to be that device symbol look ups didn't rely on the magic of `__cudaRegisterFatBinary` to work, but it seems they do now. — talonmies, Aug 03 '15 at 21:12

CudaMemcpyToSymbol fails when called from constructor

0 Answers0