I have been trying cuda recently and I had problems with the following cuda kernel.
__global__ void addKernel(float *c, const float *a, const float *b, int nsize)
{
int blockID = blockIdx.x + blockIdx.y*gridDim.x;
int i = blockID*blockDim.x+threadIdx.x;
if (i < nsize){
c[i] = a[i] + b[i];
}
float k = c[i];
}`
This kernel is used to do a simple vector addition. It would work fine without the last statement float k = c[i];
. But after I added this statement, I will receive unspecified launch failure
error when I run the code. Can anyone tell me what's wrong with this kernel?