0

i think an array can be allocated on gpu ex. __device__ int device_array[100]; without using cudaMalloc as the lenght is known. But when i run the following code some irrelevant numbers are displayed. I examined a popular book for cuda and all examples in there uses cudaMalloc. A fixed size array can be used like this or it must be allocated with cudaMalloc?

__device__ int device_array[100];

__global__ void kernel() {

    device_array[blockIdx.x] = blockIdx.x;
}

void call_kernel( int *host_array ) {

    kernel<<<100,1>>>();

    cudaMemcpy( host_array, device_array, 100 * sizeof( int ), cudaMemcpyDeviceToHost );
}

int main() {

    int host_array[100];

    call_kernel( host_array );

    for ( int i = 0; i < 100; i++ )
        cout << host_array[i] << endl;
}
Ian Decks
  • 241
  • 1
  • 5
  • 14
  • Your code has no error checking. It is probable that the `cudaMemcpy` call is failing, but you just don't know it because you are not checking the return status. Once you confirm that an error is occurring at runtime, the source of the problem will become apparent. – talonmies Mar 28 '13 at 19:28
  • More [clues](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-variable-qualifier). How to do error checking is nicely discussed [here](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). – Robert Crovella Mar 28 '13 at 19:32

1 Answers1

1

As Robert alluded to in his comment, you have to use cudaMemcpyFromSymbol when accessing a __device__ symbol on the host. Thus your cudaMemcpy call in its present form should be giving an error along the lines of "invalid argument". If you want to see this, try changing your cudaMemcpy line to:

cudaError_t cuda_status = cudaMemcpy(...); 
std::cout << cudaGetErrorString(cuda_status) << std::endl;

Anyways, if you want to get the right answer, you should change your cudaMemcpy line to be:

cudaMemcpyFromSymbol( host_array, device_array, 100 * sizeof( int ), 0, cudaMemcpyDeviceToHost);

The signature for cudaMemcpyFromSymbol is:

cudaError_t cudaMemcpyFromSymbol ( void* dst, const void* symbol, size_t count, size_t offset = 0, cudaMemcpyKind kind = cudaMemcpyDeviceToHost )

The offset defaults to 0 and the memory copy direction defaults to cudaMemcpyDeviceToHost, so those are technically optional in your case. The main takeaway from all this is to always check your cuda-call's return values, as they generally lead you in the right direction.

alrikai
  • 4,123
  • 3
  • 24
  • 23
  • If you want to copy from the Host to Device in the same manner, then you might be better served by the function "cudaMemcpyToSymbol". I'd advise taking a look at the available CUDA runtime API functions over at http://docs.nvidia.com/cuda/cuda-runtime-api/index.html#group__CUDART__MEMORY – alrikai Mar 28 '13 at 20:13