0

I am running a cuda vec addtion program and getting zeros as the output of its sum later. I have tried debugging but am not able to get to the problem at hand. It should be adding the numbers but is rather simply printing out zeros which I am not able to understand why is happening.

I have tried doing everything to the code and still I am not getting any output.

using namespace std;

 __global__ void vecADDKernal(double *A, double *B, double *C, int n){
    int id = blockIdx.x*blockDim.x+threadIdx.x;
    if(id<n) C[id] = A[id] + B[id];
}


int main( ){
    int n = 1048576;
    int size = n*sizeof(double);
    double *d_A, *d_B;
    double *d_C;
    double *h_A, *h_B, *h_C;

    h_A = (double*)malloc(size);
    h_B = (double*)malloc(size);
    h_C = (double*)malloc(size);

    cudaMalloc(&d_A, size);
    cudaMalloc(&d_B, size);
    cudaMalloc(&d_C, size);


    int i;
    // Initialize vectors on host
    for( i = 0; i < n; i++ ) {
        h_A[i] = 2*i;
        h_B[i] = 3*i;
    }

    cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
    cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

    int blockSize = 256;

    // Number of thread blocks in grid
    int gridSize = ceil(n/blockSize);

    vecADDKernal<<<gridSize, blockSize>>>(d_A, d_B, d_C, n);

    cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);

    double sum = 0;
    for(int a = 0; a<n; a++) {
        sum = h_C[a];
        cout<<h_C[a]<<endl;
    }

    cout<<"HI "<< sum <<endl;
    cudaFree(d_A);
    cudaFree(d_B);
    cudaFree(d_C);
    return 0;
}
wohlstad
  • 12,661
  • 10
  • 26
  • 39
  • 4
    add [proper CUDA error checking](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) to your code. – Robert Crovella Aug 15 '22 at 13:55
  • 1
    I deleted my answer because it was irrelevant with `n=1048576` which is a multiply of `blockSize` (`256`). But as a side note: `ceil(n/blockSize)` should be changed to `ceil((double)n/blockSize)` if you want `ceil` to have an effect where `n` is not a multiple of `blockSize`. – wohlstad Aug 15 '22 at 16:54

1 Answers1

1

I just ran your code as is (only adding the necessary includes) and I got non-zero output. Have you verified that your device can run the nvidia provided sample successfully?

The sample is exactly what you are trying to do with vector addition, but with proper error checking and result verification.

A few notes:

  • The first line in your for loop assigns (=) a value to sum, instead of adding the value to sum (+=), so you will only have the last value in sum instead of the accumulated value.

  • Proper error checking helps, even with trivial examples. The sample
    provides an example as does the answer Robert linked.

  • Did you try opening up the memory debugger to see if your values were infact 0 in memory? printing to console is another place things can go wrong.

  • You can use vectors to store your host data. You can access the raw
    array for memCpy operations with vector.data() and it gives you easy access to all sorts of useful things like range based for as well as things like accumulate and fill functions.

Treeman
  • 100
  • 6