-1

I am just beginning CUDA and C and I am trying to do simple addition. When I try to print the result, I am getting the following as output: " 3 + 4 is 1"

To compile the code, I am running the command "nvcc test.cu" which generates a.out

Thanks for your help.

Here is test.cu:

#include <stdio.h>                                                                                 

__global__ void add(int a, int b, int *c){                                                         
         *c = a + b;                                                                                
}                                                                                                  

int main(){                                                                                                                                                                                         
        int a,b,c;                                                                                 
        int *dev_c;                                                                                

        a=3;                                                                                       
        b=4;                                                                                       

        cudaMalloc((void**)&dev_c, sizeof(int));                                                   
        add<<<1,1>>>(a,b,dev_c);                                                                   
        cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost);                                
        printf("%d + %d is %d\n", a, b, c);                                                        
        cudaFree(dev_c);

        return 0;                                                                                  
}  
talonmies
  • 70,661
  • 34
  • 192
  • 269
LegacyBear
  • 77
  • 5
  • It might interest you to see the value of `int *dev_c;` and its pointed to value, *before* passing it to `cudaMemcpy`. I don't know, just a debugging suggstion. – Weather Vane Aug 19 '17 at 21:10
  • For fast and simple debugging you can also move the printf to the kernel to see if it is really executed and the parameters are correct. – dari Aug 19 '17 at 21:13
  • @dari should not use `printf` in kernel? – Weather Vane Aug 19 '17 at 21:17
  • 2
    There is nothing wrong with your code per-se. It is valid and when I compile on my machine and run it I get `3 + 4 is 7`. So I think the problem is probably in your machine setup (e.g. CUDA not installed properly). When you're having trouble with a CUDA code its good practice to use [proper CUDA error checking](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) and also run your code with `cuda-memcheck`. If you use these methods it will likely give you an indication of a machine setup issue if there is one. – Robert Crovella Aug 19 '17 at 21:58
  • copy a,b,c to the device to conduct the computation. – ztdep Jun 03 '18 at 06:38

1 Answers1

1

for debugging purposes, you should use printf inside the kernel. But I think your problem is that dev_c is not a raw pointer so cudaMemcpy didn't work well

cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost);

dk1111
  • 188
  • 1
  • 5