I'm implementing an algorithm in Cuda that needs to perform the following steps:
Given an array x
(in shared mem) and some device function f
,
- Select a pair of indices
(i,j)
tox
(randomly). - Calculate
y = f(x[i], x[i - 1], x[j], x[j + 1])
. - Based on
y
decide whether to exchange the positions of x[i] and x[j].
The problem is that the function f
depends on 4 values in shared memory, all of which have to be guaranteed to remain unchanged until after the swap.
For a minute I figured this could be the body of a critical section, but I don't see how I could use a single lock-address to lock 4 variables. The main problem, I think, is that when some thread is working on (i,j)
, other threads are not allowed to work on any pair (k,l)
where k
or l
are any of {i, i-1, j, j+1}
.
EDIT
Right after posting, an idea came to mind... Would it be possible to have a cascade of locks? First lock x[i]
, if that succeeds lock x[i-1]
, etc for all 4 values. Only if the final lock succeeds, process the above mentioned steps. I'll go experiment and keep this question open to other suggestions.