0

I have been trying cuda recently and I had problems with the following cuda kernel.

__global__ void addKernel(float *c, const float *a, const float *b, int nsize)
{   
    int blockID = blockIdx.x + blockIdx.y*gridDim.x;
    int i = blockID*blockDim.x+threadIdx.x;
    if (i < nsize){
        c[i] = a[i] + b[i];
    }
    float k = c[i]; 
}`

This kernel is used to do a simple vector addition. It would work fine without the last statement float k = c[i];. But after I added this statement, I will receive unspecified launch failure error when I run the code. Can anyone tell me what's wrong with this kernel?

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
andy90
  • 525
  • 5
  • 19

1 Answers1

1

You really should show a complete code, to include the actual device memory allocations and the way you are launching this kernel (blocks, threads, etc). But very likely you are launching more than enough threads to cover the work size (i.e. vector length). That's a fairly common CUDA practice.

When you do that, it's customary to include a thread-check in your kernel:

if (i < nsize){

to make sure that the i values that actually get used for indexing, are valid (i.e. within the vector length).

But then you've broken things by including this statement outside the thread-check (i.e. outside the body of the if-statement):

float k = c[i]; 

Now, for any computed i in your kernel, an attempt will be made to index into the c vector at that location, even if i is greater than nsize which is presumably the length of the c vector.

So most likely this statement is indexing out-of-range for the c vector allocation. You can confirm this with a bit more debugging, perhaps using a method such as what is described here.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thanks! I understand my mistake now. – andy90 Jul 06 '16 at 21:49
  • As an aside, this error for this particular case would/should only manifest itself if you are compiling the code with debug switch (`-G`). In release mode, the compiler will recognize that the final statement in the kernel affects no global state and will optimize it away. Nevertheless, the code is broken as written. – Robert Crovella Jul 07 '16 at 00:04