2

I'm getting started with CUDA, and I'm having some issues. The code I've posted below is basically the simplest example off the NVIDIA website, with some memory copies and a print statement added to make sure that it's running correctly.

The code compiles and runs without complaint, but when I print the vector c it comes out all zeros, as if the GPU kernel function isn't being called at all.

This is almost exactly the same as this post Basic CUDA - getting kernels to run on the device using C++.

The symptoms are the same, although I don't seem to be making this error. Any ideas?

#include <stdio.h>

static const unsigned short N = 3;

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
    int i = threadIdx.x;
    C[i] = A[i] + B[i];
} 

int main()
{
  float *A, *B, *C;
  float a[N] = {1,2,3}, b[N] = {4,5,6}, c[N] = {0,0,0};

  cudaMalloc( (void **)&A, sizeof(float)*N );
  cudaMalloc( (void **)&B, sizeof(float)*N );
  cudaMalloc( (void **)&C, sizeof(float)*N );

  cudaMemcpy( A, a, sizeof(float)*N, cudaMemcpyHostToDevice );
  cudaMemcpy( B, b, sizeof(float)*N, cudaMemcpyHostToDevice );

  VecAdd<<<1, N>>>(A, B, C);

  cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

  printf("%f %f %f\n", c[0],c[1],c[2]);

  cudaFree(A);
  cudaFree(B);
  cudaFree(C);

  return 0;
}
Community
  • 1
  • 1
user3195869
  • 75
  • 1
  • 6
  • 2
    Always always always check the return value of functions. After the kernel call, call `cudaGetLastError`, too. – Kerrek SB Feb 24 '14 at 08:52

1 Answers1

5

In the last cudaMemcpy call, you are passing incorrect flag for memory copy direction.

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyHostToDevice );

It should be:

cudaMemcpy( c, C, sizeof(float)*N, cudaMemcpyDeviceToHost );
sgarizvi
  • 16,623
  • 9
  • 64
  • 98
  • Indeed! But, when I make that change, the effect is the same - the vector c prints out as zeros. – user3195869 Feb 24 '14 at 14:06
  • Add [error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) to your code. There may be several reasons for unexpected output. Which GPU do you have? What compute capability are you compiling the code for? – sgarizvi Feb 24 '14 at 14:44
  • My machine has a G210M and a 9400M G. I'm not as certain about the 9400M G, but the G210M is listed as have a compute capability of 1.1, so that's what I've compiled for. This is the command line I've been using: `nvcc cuda-test.cu -o cuda-test --gpu-code compute_11 --gpu-architecture=compute_11` – user3195869 Feb 24 '14 at 15:10
  • The error checking revealed a lot: `GPUassert: CUDA driver version is insufficient for CUDA runtime version cuda-test.cu 30`. It seems pretty clear at this point that there's an issue with the drivers. – user3195869 Feb 24 '14 at 20:35
  • After reinstalling my drivers, the cuda code runs as expected. Thanks! – user3195869 Feb 24 '14 at 21:49