I've been experimenting with atomic operations in CUDA, but I can't get thread index numbers to be included in the operations, it looks like they are just treated as zeros as in the examples shown below:
Is there anything I'm doing wrong in the code below?
Code 1: adding thread index value to dest[10] (not working, dest[10] is 0 after running, I would expect it to be greater than 0 as it would add the value of the index to dest[10] each time)
__global__ void add_test(int* dest, float *a, float *b, float *c)
{
int ix = ((blockIdx.x * blockDim.x) + threadIdx.x);
int idx = threadIdx.x;
atomicAdd(dest+10,idx);
}
Code 2: if I use a constant, then it seems to work (at the end of the run dest[10]=2, but again I would expect it to be greater than 2 as it should add 2 for every running thread/block):
__global__ void add_test(int* dest, float *a, float *b, float *c)
{
int ix = ((blockIdx.x * blockDim.x) + threadIdx.x);
int idx = threadIdx.x;
atomicAdd(dest+10,2);
}
My test call looks like:
add_test<<<(1024,1,1), (41,1584,1)>>>