How to use and declare global device array in CUDA?

Question

I want to declare a global scope array at runtime in device memory that is accessible by all of the device functions. Doing the following satisfies the compiler:

__device__ float *g_array;

However, I am now getting compiler warnings at cudaMemcpy and cudaMalloc that I am accessing a device variable from a host function. I assumed that I need to get the actual device address, so I tried cudaGetSymbolAddresshowever I get Invalid device symbol error.

This is basically what I want to do:

__device__ float *g_array;
__constant__ int N;

__global__ void device_add_one()
{
    for (int index = blockIdx.x * blockDim.x + threadIdx.x;
         index < N;
         index += blockDim.x * gridDim.x)
    {
         g_array[index] += 1;
    {
}

void init_arrays(float *host_array, int size)
{
    cudaMalloc(&g_array, (size * sizeof(float));
    cudaMemcpy(g_array, host_array, (size * sizeof(float)), cudaMemcpyHostToDevice);

    cudaMemcpyToSymbol(N, &size, sizeof(N));

    device_add_one<< < 1, 254>> >();

    cudaMemcpy(host_array, g_array, (size * sizeof(float), cudaMemcpyDeviceToHost);
}

Note: It has to work with computability 2.0

The answer [here](https://stackoverflow.com/questions/28821743/sharing-roots-and-weights-for-many-gauss-legendre-quadrature-in-gpus/28822918#28822918) demonstrates how to do a dynamic allocation for a `__device__` pointer. It is necessary to create a temp pointer and do an ordinary `cudaMalloc` style allocation on that temp pointer, then copy that allocated pointer value to the `__device__` (pointer) variable. The proximal reason for this is that you are *not allowed to take the address of a `__device__` construct in host code*. — Robert Crovella, Aug 11 '17 at 03:04
[Here](https://stackoverflow.com/questions/16929971/cudamalloc-global-array-cause-seg-fault) is another example. Your question is arguably a duplicate of that one. — Robert Crovella, Aug 11 '17 at 03:08

How to use and declare global __device__ array in CUDA?

0 Answers0

How to use and declare global device array in CUDA?