I have a CUDA program:
#include <stdio.h>
#include <cuda.h>
__global__ void array_add(float *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] + 1.0f;
}
int main(void)
{
float *a_h, *a_d;
const int N = 10;
size_t size = N * sizeof(float);
a_h = (float *)malloc(size);
cudaMalloc((void **) &a_d, size);
for (int i=0; i<N; i++) a_h[i] = (float)i;
cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
int block_size = 4;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
array_add <<< n_blocks, block_size >>> (a_d, N);
cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);
free(a_h); cudaFree(a_d);
}
Actually I did not write this, it is a test program. But I do understand what it is supposed to do.
I compiled this on my (Linux Mint - Ubuntu 15.04) system with nvcc cuda.cu
- this file is saved as cuda.cu
.
It runs, but the output is the following:
./a.out
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000
Which isn't what I expected to see. I expected the value 1.0f
to be added to all these values - but this doesn't appear to happen.
What is going wrong here?
CUDA Memcheck Results
Here are the results of CUDA Memcheck...
========= CUDA-MEMCHECK
========= Program hit cudaErrorInvalidDeviceFunction (error 8) due to "invalid device function " on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2ef313]
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcudart.so.6.5 (cudaLaunch + 0x17e) [0x3686e]
========= Host Frame:./a.out [0xe02]
========= Host Frame:./a.out [0xd01]
========= Host Frame:./a.out [0xd23]
========= Host Frame:./a.out [0xbbd]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20a40]
========= Host Frame:./a.out [0x999]
=========
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000
========= ERROR SUMMARY: 1 error
Guess something is wrong with my setup... Any ideas on what it could be / how to fix it?
GPU Type
I am running this on a GTX 260. My problem was solved by compiling with nvcc cuda.cu -arch=sm_11
. Thanks to Robert for this.