How to implement Critical Section in cuda?

Question

I implemented Critical Section like presented in many articles.

Code framework follows

declaration of global device variables;

__device__ int gpnIntArray[3200];
__device__ int gnInt, gnLock;

Host code

int nTemp = 0;
cudaMemcpyToSymbol(gnInt, &nTemp, sizeof(int), 0, cudaMemcpyHostToDevice);
cudaMemcpyToSymbol(gnLock, &nTemp, sizeof(int), 0, cudaMemcpyHostToDevice);
RaceConditionSolution<<<100, 32>>>();

Device code

__global__ void RaceConditionSolution()
{
    while (atomicCAS(&gnLock, 0, 1) != 0){}
    gpnIntArray[gnInt] = 1;
    gnInt ++;
    atomicExch(&gnLock, 0);
}

I am about to update global device variables - gpnIntArray[3200] and gnInt.

But, this code makes my pc freeze.

What is the problem? and please help me with solution in this case.

Thanks in advance.

threads in a warp execute in lockstep. The threads in a warp entering your while loop must *all* acquire the lock before any can proceed beyond that while loop. Unfortunately this is impossible, and you have deadlock. I suggest you do not arbitrate for locks between threads in a warp as this is quite difficult. Instead use the linked duplicate question/answer, and manage intra-threadblock activity separately. — Robert Crovella, Oct 06 '14 at 18:18
@Robert Crovella, the answer you suggested is dealing with first thread in threadblock, So applying it to my code, gnInt becomes 100, not 3200. I want to deal with all threads, expecting gnInt 3200, so I commented the line if (threadIdx.x == 0). But it also makes my pc freeze. what can i do? and also, i want cost effective critical section. Thanks. — user2665684, Oct 07 '14 at 01:21
In my answer, I explained that you should handle threads in a block separately. Do not force threads in a block to compete for the lock. Compete for the lock between threadblocks. Once a threadblock has acquired a lock, use ordinary threadblock synchronization methods to control which threads have access. Using ordinary threadblock sync methods to control access within the threadblock will be more cost-effective than having every thread compete for a global lock anyway. — Robert Crovella, Oct 07 '14 at 01:34
Sorry, Robert. I get what you mean But, I can't figure out how can i solve the race condition among threads in same thread. Would you repair my code for me if it doesn't take your so much time? Then I will appreciate you. Thanks in advance. — user2665684, Oct 07 '14 at 02:13
[here](http://pastebin.com/MEV1iWk9) is an example of how to make the code you have shown work. But it is fragile and it may break if you change things. And it is slow. [here](http://pastebin.com/3G8jFP8q) is an example of how to make my version work. It is faster to use threadblock sync mechanisms than competition for a global lock. And my version is less prone to breakage depending on what you put in the critical section. — Robert Crovella, Oct 07 '14 at 03:13
Thanks a lot Robert. I am developping the big project. the code snippet i posted is it's framework. In my origin project, updating part for global variables belongs in if condition clause. then, your solution(second) has the various result time to time. but not for first solution - it's correct. (i.e gnInt varies). It(second implementation) seems to miss some cases. thank you very much anyway and you can post your solution as an answer, i will mark it answer. — user2665684, Oct 07 '14 at 06:37
In the code that I have shown, in both cases, the `gpnIntArray` is properly populated with 1, and the `gnInt` is correct (always). I think it's fair to say, based on that, there are no defects. I guess what you're saying is that in code that you haven't shown, there are defects. If I had to guess, I'd say you have a bug in your code, and perhaps an unclear understanding of how to implement these two cases. The second case certainly requires an understanding of how to implement intra-threadblock synchronization. The benefit is considerably faster code as I have demonstrated. — Robert Crovella, Oct 07 '14 at 06:50

How to implement Critical Section in cuda?

0 Answers0

Linked