CUDA atomicAdd using thread.x not returning expected results

Question

I've been experimenting with atomic operations in CUDA, but I can't get thread index numbers to be included in the operations, it looks like they are just treated as zeros as in the examples shown below:

Is there anything I'm doing wrong in the code below?

Code 1: adding thread index value to dest[10] (not working, dest[10] is 0 after running, I would expect it to be greater than 0 as it would add the value of the index to dest[10] each time)

__global__ void add_test(int* dest, float *a, float *b, float *c)
{
    int ix = ((blockIdx.x * blockDim.x) + threadIdx.x);
    int idx = threadIdx.x;
    atomicAdd(dest+10,idx);
}

Code 2: if I use a constant, then it seems to work (at the end of the run dest[10]=2, but again I would expect it to be greater than 2 as it should add 2 for every running thread/block):

__global__ void add_test(int* dest, float *a, float *b, float *c)
{
    int ix = ((blockIdx.x * blockDim.x) + threadIdx.x);
    int idx = threadIdx.x;
    atomicAdd(dest+10,2);
}

My test call looks like:

add_test<<<(1024,1,1), (41,1584,1)>>>

score 2 · Answer 1 · edited May 23 '17 at 12:15

This isn't a valid kernel launch:

add_test<<<(1024,1,1), (41,1584,1)>>>

You cannot ask for thread block dimensions of (41,1584,1)

My guess is you are doing no proper cuda error checking and have not run your code with cuda-memcheck, as either of these would have indicated the error, and that your kernel is not running properly.

The maximum in either of the first two dimensions is either 512 or 1024, and the maximum combined dimensions (i.e. the product of the dimensions = total threads) is 512 or 1024 depending on GPU.

In the future, please provide a complete, compilable code if you are asking for help with a code that is not working. SO expects this and it is a valid close reason for your question if you don't.

CUDA atomicAdd using thread.x not returning expected results

1 Answers1