Why is my cuda sum program not working on the device end?

Question

I'm doing this simple sum program in CUDA and it keeps giving me back "The answer is 5" when the anwer should be 14. I think that means it's not copying the data from the device back to the host, but I'm not sure. Here's the program I'm using, called "newsum.cu":

#include <iostream>
#include <cuda.h>

using namespace std;

__global__ void AddIntsCUDA(int* a, int* b)
{
a[0] += b[0];
}

int main()
{
int a = 5, b = 9;
int *d_a, *d_b;

//maybe put (void **) before &d_a in cudaMalloc?
cudaMalloc(&d_a, sizeof(int));
cudaMalloc(&d_b, sizeof(int));

cudaMemcpy(d_a, &a, sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(d_b, &b, sizeof(int), cudaMemcpyHostToDevice);

AddIntsCUDA<<<1, 1>>>(d_a, d_b);

cudaMemcpy(&a, d_a, sizeof(int), cudaMemcpyDeviceToHost);

cout<<"The answer is "<<a<<endl;

cudaFree(d_a);
cudaFree(d_b);

return 0;
}

And here's the makefile I'm using to compile it:

# A simple CUDA makefile
#
# Author: Naga Kandasamy
# Date: 9/16/2015
#
# CUDA depends on two things:
#  1) The CUDA nvcc compiler, which needs to be on your path,
#       or called directly, which we do here
#  2) The CUDA shared library being available at runtime,
#       which we make available by setting the LD_LIBRARY_PATH
#       variable for the duration of the makefile.
#
#   Note that you can set your PATH and LD_LIBRARY_PATH variables as part of your
# .bash_profile so that you can compile and run without using this makefile.

NVCCFLAGS       := -O3 -gencode arch=compute_30,code=sm_30
NVCC            := /usr/local/cuda/bin/nvcc
LD_LIBRARY_PATH := /usr/local/cuda/lib64

all: newsum

newsum: newsum.cu
        $(NVCC) -o newsum newsum.cu $(NVCCFLAGS)

clean:
        rm newsum

Why is it getting me the wrong answer?

Your program works correctly for me and prints out "The answer is 14". Any time you are having trouble with a CUDA code, it's good practice to use [proper cuda error checking](http://stackoverflow.com/questions/14038589) and also run your code with `cuda-memcheck` (e.g., in linux, `cuda-memcheck ./newsum`). These are useful troubleshooting tools and will also point out problems with your platform (no GPU, incorrect CUDA install, compiling for the wrong architecture -- `compute_30,sm_30` which does not match your actual GPU, etc.) which is likely the problem here. What GPU do you have? — Robert Crovella, Nov 22 '15 at 21:34
I'm using an ssh to a remote host with a GTX 460. I tried cuda-memcheck ./newsum: == CUDA-MEMCHECK The answer is 5 == Program hit cudaErrorInvalidDeviceFunction (error 8) due to "invalid device function" on CUDA API call to cudaLaunch. == Saved host backtrace up to driver entry point at error == Host Frame:/usr/lib64/libcuda.so.1 [0x2f2d83] == Host Frame:./newsum [0x3195e] == Host Frame:./newsum [0x2c16] == Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed5d] == Host Frame:./newsum [0x27a9] How do I know what architecture to compile for? — JoshBreece, Nov 22 '15 at 22:25
Your GTX 460 is a compute capability 2.0 device. You can discover this by running the `deviceQuery` cuda sample code, or just googling. The "invalid device function" error is the typical error when you have compiled for an architecture that won't work on your device (which is the case here). You can fix this by changing `compute_30` to `compute_20` and `sm_30` to `sm_20` wherever they appear in your Makefile. — Robert Crovella, Nov 23 '15 at 00:33

Why is my cuda sum program not working on the device end?

0 Answers0