How to make a atomic device function in cuda?

Question

My kernel write the results to some global device variables. So, I need to make the function to write as a atomic one. Is it possible? If it is not, i am trying to use atomicExch() about every global variables. But some of them are struct or float not int. As i know, atomic operations are for int. How can i deal with this problem. Thanks.

Have you read the [relevant section](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions) of the programming guide? Some atomic operations can operate on `float` quantities. The exact atomic support will depend on what type of GPU (what compute capability) you are compiling for. Atomics cannot be used on an entire struct, only on POD types. Is your question about how to update multiple variables atomically (ie. all at once)? — Robert Crovella, Sep 17 '14 at 04:14
thx, Robert. my gpu is geforce gtx 750 and compute capability is 5.0. and also. the struct is POD. i found that there is a atomicExch for float, but as you said, how can i apply for POD struct? my first question was that its possible to make a user defined function as a atomic function or critical section for several instructions because i defined the function to assign results to global device variables. — user2668204, Sep 17 '14 at 06:42
to update an entire structure "atomically", probably using a [critical section code](http://stackoverflow.com/questions/18963293/cuda-atomics-change-flag/18968893#18968893) is one possibility. [This question](http://stackoverflow.com/questions/19363066/how-to-use-atomiccas-for-multiple-variables-with-conditionals-in-cuda) may also be of interest. — Robert Crovella, Sep 17 '14 at 13:30
Thanks so much, its great deal with race condition. Can i have a another question? my kernel fails with cudaErrorLaunchOutofResources. I thought its related to over use of registers per block. but ptxas tells that kernel use only 28 registers. that kernel is launch with 32*32 = 1024 threads, so the number of used registers per block are 1024*28 = 28672 < 65536(Geforce GTX 750). thus this is not cause. then what is wrong? ptxas result about that kernel function is as follow — user2668204, Sep 18 '14 at 03:16
1> ptxas info : Compiling entry function '_Z9ARM_MatchPjPxiif' for 'sm_50' 1> ptxas info : Function properties for _Z9ARM_MatchPjPxiif 1> 224 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 1> ptxas info : Used 28 registers, 348 bytes cmem[0] — user2668204, Sep 18 '14 at 03:17

score 0 · Accepted Answer · answered Sep 18 '14 at 16:08

0

I got the reason why cudaErrorLaunchOutOfResources error raised. My kernel used 28 registers, but project setting did not cover it. CUDA C/C++ ->Device->Max Used Register was set 0. after changing it to 30, error disappeared.

answered Sep 18 '14 at 16:08

user2668204

37
6

How to make a atomic device function in cuda?

1 Answers1