How can I write the memory pointer in CUDA

Question

I declared two GPU memory pointers, and allocated the GPU memory, transfer data and launch the kernel in the main:

// declare GPU memory pointers
char * gpuIn;
char * gpuOut;

// allocate GPU memory
cudaMalloc(&gpuIn, ARRAY_BYTES);
cudaMalloc(&gpuOut, ARRAY_BYTES);

// transfer the array to the GPU
cudaMemcpy(gpuIn, currIn, ARRAY_BYTES, cudaMemcpyHostToDevice);

// launch the kernel
role<<<dim3(1),dim3(40,20)>>>(gpuOut, gpuIn);

// copy back the result array to the CPU
cudaMemcpy(currOut, gpuOut, ARRAY_BYTES, cudaMemcpyDeviceToHost);

cudaFree(gpuIn);
cudaFree(gpuOut);

And this is my code inside the kernel:

__global__ void role(char * gpuOut, char * gpuIn){
    int idx = threadIdx.x;
    int idy = threadIdx.y;

    char live = '0';
    char dead = '.';

    char f = gpuIn[idx][idy];

    if(f==live){ 
       gpuOut[idx][idy]=dead;
    }
    else{
       gpuOut[idx][idy]=live;
    } 
}

But here are some errors, I think here are some errors on the pointers. Any body can give a help?

"Some error": What kind of errors, specifically? What is the exact error message? What happens if you add proper error checking to the CUDA API calls? — njuffa, Apr 27 '17 at 00:22
error 1. "expression must have pointer-to-object type" on line of char f = gpuIn[idx][idy]; , gpuOut[idx][idy]=dead; and gpuOut[idx][idy]=live; inside of the kernel. error 2. "argument of type "char *" is incompatible with parameter of type "char"" on the line which I launched the kernel in the main role<<>>(gpuOut, gpuIn); — Boming YU, Apr 27 '17 at 00:53
Well, since `gpuIn` in your kernel is a pointer to `char`, you can't do a double-dereference of it like `gpuIn[idx][idy]`; that wouldn't work in ordinary C or C++ code, so it's no surprise that it doesn't work in CUDA. You should provide a [mcve]. You can edit your question, you don't need to jam this stuff into comments. — Robert Crovella, Apr 27 '17 at 01:40
I'm trying to get the position in the 2D array by the pointers 'gpuIn' and 'gpuOut' in the kernel. How can I do this? — Boming YU, Apr 27 '17 at 02:02
This is really a duplicate of http://stackoverflow.com/a/18930734/681865 — talonmies, Apr 27 '17 at 07:55

talonmies · Accepted Answer · 2017-04-27T13:35:50.430

The key concept is the storage order of multidimensional arrays in memory -- this is well described here. A useful abstraction is to define a simple class which encapsulates a pointer to a multidimensional array stored in linear memory and provides an operator which gives something like the usual a[i][j] style access. Your code could be modified something like this:

template<typename T>
struct array2d
{
    T* p;
    size_t lda;

    __device__ __host__
    array2d(T* _p, size_t _lda) : p(_p), lda(_lda) {};

    __device__ __host__
    T& operator()(size_t i, size_t j) {
        return p[j + i * lda]; 
    }
    __device__ __host__
    const T& operator()(size_t i, size_t j) const {
        return p[j + i * lda]; 
    }
};

__global__ void role(array2d<char> gpuOut, array2d<char> gpuIn){
    int idx = threadIdx.x;
    int idy = threadIdx.y;

    char live = '0';
    char dead = '.';

    char f = gpuIn(idx,idy);

    if(f==live){ 
       gpuOut(idx,idy)=dead;
    }
    else{
       gpuOut(idx,idy)=live;
    } 
}

int main()
{        
    const int rows = 5, cols = 6;
    const size_t ARRAY_BYTES = sizeof(char) * size_t(rows * cols);

    // declare GPU memory pointers
    char * gpuIn;
    char * gpuOut;

    char currIn[rows][cols], currOut[rows][cols];

    // allocate GPU memory
    cudaMalloc(&gpuIn, ARRAY_BYTES);
    cudaMalloc(&gpuOut, ARRAY_BYTES);

    // transfer the array to the GPU
    cudaMemcpy(gpuIn, currIn, ARRAY_BYTES, cudaMemcpyHostToDevice);

    // launch the kernel
    role<<<dim3(1),dim3(rows,cols)>>>(array2d<char>(gpuOut, cols), array2d<char>(gpuIn, cols));

    // copy back the result array to the CPU
    cudaMemcpy(currOut, gpuOut, ARRAY_BYTES, cudaMemcpyDeviceToHost);

    cudaFree(gpuIn);
    cudaFree(gpuOut);

    return 0;
}

The important point here is that a two dimensional C or C++ array stored in linear memory can be addressed as col + row * number of cols. The class in the code above is just a convenient way of expressing this.

How can I write the memory pointer in CUDA

1 Answers1

Linked