Is it possible for a thread to atomically update 4 different places of the shared memory?

Question

Suppose a thread of a kernel is trying to update 4 different places in shared memory. Can I cause that operation to fail and be reversed if any other thread has overwritten any of those locations? Specifically, can this be performed atomically?

mem[a] = x;
mem[b] = y;
mem[c] = z;
mem[d] = w;

no it can't be performed atomically, in the general case where `a`, `b`,`c`, and `d` are arbitrary, and/or `x`,`y`,`z`, `w` are each 32 bits or larger. By "atomically" I mean using an atomic RMW operation that the hardware provides. You can use critical sections to achieve such things, probably at considerable performance cost, code complexity, and fragility. The other alternative is to recast your algorithm to use some form of parallel reduction. — Robert Crovella, Apr 25 '19 at 13:34
I see, thank you. I'll rethink my algorithm around this limitation. That comment looks like a valid answer, by the way. — MaiaVictor, Apr 25 '19 at 14:19

Robert Crovella · Accepted Answer · 2019-04-25T19:25:59.083

No, except for a special case.

This can't be performed atomically, in the general case where a, b,c, and d are arbitrary (i.e. not necessarily adjacent), and/or x,y,z, w are each 32 bits or larger.

I'm using "atomically" to refer to an atomic RMW operation that the hardware provides.

Such operations are limited to a maximum of 64-bits total, so 4 32-bit or larger quantities could not work. Furthermore all data must be contiguous and "naturally" aligned, so independent locations cannot be accessed in a single atomic cycle.

In the special case where the 4 quantities are 16-bit or 8-bit quantities, and adjacent and aligned, you could use a custom atomic.

Alternatives to consider:

You can use critical sections to achieve such things, probably at considerable performance cost, code complexity, and fragility.

Another alternative is to recast your algorithm to use some form of parallel reduction. Since you appear to be operating at the threadblock level, this may be the best approach.

Is it possible for a thread to atomically update 4 different places of the shared memory?

1 Answers1

No, except for a special case.

Alternatives to consider: