Making data reads/writes atomic in C11 GCC using ?

Question

I have learned from SO threads here and here, among others, that it is not safe to assume that reads/writes of data in multithreaded applications are atomic at the OS/hardware level, and corruption of data may result. I would like to know the simplest way of making reads and writes of int variables atomic, using the <stdatomic.h> C11 library with the GCC compiler on Linux.

If I currently have an int assignment in a thread: messageBox[i] = 2, how do I make this assignment atomic? Equally for a reading test, like if (messageBox[i] == 2).

Perhaps a reference like [this one](https://en.cppreference.com/w/c/atomic) could help? — Some programmer dude, Jan 09 '20 at 10:53
I have seen that but as it's a reference only, I was hoping someone here may have some code that I could make sense of. The reference is too terse, I don't know where to start. — Theo d'Or, Jan 09 '20 at 11:22
To set an atomic value you must [*store*](https://en.cppreference.com/w/c/atomic/atomic_store) it, and to read an atomic value you must [*load*](https://en.cppreference.com/w/c/atomic/atomic_load) it. That's basically the two operations you need (beyond initialization) for the use-cases you show. — Some programmer dude, Jan 09 '20 at 11:25
Any answer is going to be specific to whatever threading standard or threading library you are using. If it provides some way to get atomic accesses, then you use that. If it does't, then you're out of luck. (Assuming you want to write portable code.) — David Schwartz, Jan 10 '20 at 06:17

score 4 · Accepted Answer · answered Jan 09 '20 at 13:41

4

For C11 atomics you don't even have to use functions. If your implementation (= compiler) supports atomics you can just add an atomic specifier to a variable declaration and then subsequently all operations on that are atomic:

_Atomic(int) toto = 65;
...
toto += 2;  // is an atomic read-modify-write operation
...
if (toto == 67) // is an atomic read of toto

Atomics have their price (they need much more computing resources) but as long as you use them scarcely they are the perfect tool to synchronize threads.

answered Jan 09 '20 at 13:41

Jens Gustedt

76,821
6
102
177

Elegant. Terse but sufficient explanation! – ryyker Jan 09 '20 at 13:50
Thank you, that's exactly what I needed to get started. I see on the CPPreference site that it gets even simpler with macros, and I am now using `volatile atomic_int x;" to declare atomic variables. (I appreciate the controversy around "volatile", but using it with atomic types does no harm and may be helpful.) – Theo d'Or Jan 10 '20 at 09:39

score 0 · Answer 2 · answered Jan 10 '20 at 22:58

If I currently have an int assignment in a thread: messageBox[i] = 2, how do I make this assignment atomic? Equally for a reading test, like if (messageBox[i] == 2).

You almost never have to do anything. In almost every case, the data which your threads share (or communicate with) are protected from concurrent access via such things as mutexes, semaphores and the like. The implementation of the base operations ensure the synchronization of memory.

The reason for these atomics is to help you construct safer race conditions in your code. There are a number of hazards with them; including:

ai += 7;

would use an atomic protocol if ai were suitably defined. Trying to decipher race conditions is not aided by obscuring the implementation.

There is also a highly machine dependent portion to them. The line above, for example, could fail [1] on some platforms, but how is that failure communicated back to the program? It is not [2].

Only one operation has the option of dealing with failure; atomic_compare_exchange_(weak|strong). Weak just tries once, and lets the program choose how and whether to retry. Strong retries endlessly. It isn't enough to just try once -- spurious failures due to interrupts can occur -- but endless retries on a non-spurious failure is no good either.

Arguably, for robust programs or widely applicable libraries, the only bit of you should use is atomic_compare_exchange_weak().

[1] Load-linked, store-conditional (ll-sc) is a common means for making atomic transactions on asynchronous bus architectures. The load-linked sets a little flag on a cache line, which will be cleared if any other bus agent attempts to modify that cache line. Store-conditional stores a value iff the little flag is set in the cache, and clears the flag; iff the flag is cleared, Store-conditional signals an error, so an appropriate retry operation can be attempted. From these two operations, you can construct any atomic operation you like on a completely asynchronous bus architecture.

ll-sc can have subtle dependencies on the caching attributes of the location. Permissible cache attributes are platform dependent, as is which operations may be performed between the ll and sc.

If you put an ll-sc operation on a poorly cached access, and blindly retry, your program will lock up. This isn't just speculation; I had to debug one of these on an ARMv7-based "safe" system.

[2]:

#include <stdatomic.h>
int f(atomic_int *x) {
    return (*x)++;
}
f:
        dmb     ish
.L2:
        ldrex   r3, [r0]
        adds    r2, r3, #1
        strex   r1, r2, [r0]
        cmp     r1, #0
        bne     .L2       /* note the retry loop */
        dmb     ish
        mov     r0, r3
        bx      lr

score 0 · Answer 3 · answered Oct 27 '22 at 15:13

The most portable way is to use one of the C11 atomic variables. You can also use a spinlock atomic operation to guard non-atomic variables. Here is a simple pthread produce/consumer example to play with, modify as desired. Notice that the cnt_non and cnt_vol can be corrupted.

atomic_uint cnt_atomic;
int cnt_non;
volatile int cnt_vol;

typedef atomic_uint lock_t;
lock_t lockholder = 0;
#define LOCK_C 0x01
#define LOCK_P 0x02
int cnt_lock;  /* not atomic on purpose to test spinlock */
atomic_int lock_held_c, lock_held_p;

void
lock(lock_t *bitarrayp, uint32_t desired)
{
    uint32_t expected = 0; /* lock is not held */

    /* the value in expected is updated if it does not match
     * the value in bitarrayp.  If the comparison fails then compare
     * the returned value with the lock bits and update the appropriate
     * counter.
     */
    do {
        if (expected & LOCK_P) lock_held_p++;
        if (expected & LOCK_C) lock_held_c++;
        expected = 0;
    } while(!atomic_compare_exchange_weak(bitarrayp, &expected, desired));
}

void
unlock(lock_t *bitarrayp)
{
    *bitarrayp = 0;
}

void*
fn_c(void *thr_data)
{
    (void)thr_data;
    
    for (int i=0; i<40000; i++) {
        cnt_atomic++;
        cnt_non++;
        cnt_vol++;

        /* lock, increment, unlock */
        lock(&lockholder, LOCK_C);
        cnt_lock++;
        unlock(&lockholder);
    }

    return NULL;
}

void*
fn_p(void *thr_data)
{
    (void)thr_data;
    for (int i=0; i<30000; i++) {
        cnt_atomic++;
        cnt_non++;
        cnt_vol++;

        /* lock, increment, unlock */
        lock(&lockholder, LOCK_P);
        cnt_lock++;
        unlock(&lockholder);
    }
    
    return NULL;
}

void
drv_pc(void)
{
    pthread_t thr[2];

    pthread_create(&thr[0], NULL, fn_c, NULL);
    pthread_create(&thr[1], NULL, fn_p, NULL);

    for(int n = 0; n < 2; ++n)
        pthread_join(thr[n], NULL);
    
    printf("cnt_a=%d, cnt_non=%d cnt_vol=%d\n", cnt_atomic, cnt_non, cnt_vol);
    printf("lock %d held_c=%d held_p=%d\n", cnt_lock, lock_held_c, lock_held_p);

}

curiousguy · Answer 4 · 2020-01-10T21:29:20.110

-2

that it is not safe to assume that reads/writes of data in multithreaded applications are atomic at the OS/hardware level, and corruption of data may result

Actually non composite operations on types like int are atomic on all reasonable architecture. What you read is simply a hoax.

(An increment is a composite operation: it has a read, a calculation, and a write component. Each component is atomic but the whole composite operation is not.)

But atomicity at the hardware level isn't the issue. The high level language you use simply doesn't support that kind of manipulations on regular types. You need to use atomic types to even have the right to manipulate objects in such a way that the question of atomicity is relevant: when you are potentially modifying an object in use in another thread.

(Or volatile types. But don't use volatile. Use atomics.)

edited Jan 10 '20 at 21:29

answered Jan 10 '20 at 04:30

curiousguy

8,038
2
40
58

"Actually operations on types like int are atomic on all reasonable architecture. What you read is simply a hoax." Are you suggesting that people *rely* on them to be in portable code? Or are you suggesting people not write portable multithreaded code? – David Schwartz Jan 10 '20 at 06:17
@DavidSchwartz The assumption is 100% portable. – curiousguy Jan 10 '20 at 07:46
@curiousguy Your answer is interesting but appears somewhat self-contradictory. It would help if you provided some evidence (in relation to the Intel/AMD PCs that should qualify as "reasonable architecture", and GCC on Linux as "the high level language") and elaborate on how it is that the atomic operations at hardware level are not supported at language level without using atomic types. Surely, if the hardware operates atomically, the language would not prevent that, no matter what the type is. A more thorough explanation, please. – Theo d'Or Jan 10 '20 at 09:35
@Theod'Or Re: HW. The atomicity (of reads and writes) is trivially guaranteed: f.ex. there is simply no mechanism that could possibly divide a *naturally aligned* word write. (And it's hard to imagine anything on any hypothetical future HW that would change that.) And all compiler give natural alignment to declared variables. So tearing does not happen, ever. (OTOH complex operations such as increments are obv. not atomic.) Re: high level languages (C and C++). There is no legal way to concurrently change obj (unless they are "atomic" types). Period. It's illegal. – curiousguy Jan 10 '20 at 21:17
1

"_that the atomic operations at hardware level are not supported at language level without using atomic types_" A modification of a word size variable that's naturally aligned is always atomic, there is no non atomic instruction variant; only composite operations that read and then write a location are non atomic. But high level languages don't expect there own variables to be modified by invisible agents, unless they are "volatile". In C (and C++) other threads are invisible agents, and so are signal handlers. (In Java atomic is spelled volatile, and there is no C volatile.) – curiousguy Jan 10 '20 at 21:27
@curiousguy What standard can I find that in? – David Schwartz Jan 11 '20 at 01:28
@DavidSchwartz Find what? – curiousguy Jan 11 '20 at 01:39
@curiousguy Any reason to think that assumption is 100% portable beyond your claim. I'm old enough to remember when people were asserting that it was 100% portable to assume that operations were executed in the order they appeared in the program. – David Schwartz Jan 11 '20 at 20:49
@DavidSchwartz "_it was 100% portable to assume that operations were executed in the order they appeared in the program_" There is nothing in the std that says that some operations are not. In fact you can only make sense of C++ by assuming that. – curiousguy Jan 14 '20 at 00:39
@curiousguy Can you give an example of something in the standard that you can't make sense of without assuming that? Pretty much every threading standard provides atomic operations and clearly explains that if you don't use them, then your operations are not guaranteed to be atomic. It's hard to make sense of the idea that sometimes it's perfectly safe to assume that operations are atomic even if the relevant standard specifically provides a way to get atomic operations and says that you don't get them if you don't ask for them. How would you know precisely when you had that guarantee? – David Schwartz Jan 14 '20 at 01:28
@DavidSchwartz I said nothing about atomicity. I only said that if you don't assume instructions are executed sequentially, no program is defined. There is nothing allowing non sequential execution in the std. But atomicity is a property of the machine, not something guaranteed by C++, so it's an entirely different subject. There is no way for any machine to provide non atomic small basic types (like `int`). – curiousguy Jan 14 '20 at 01:45
@curiousguy See [this answer](https://stackoverflow.com/a/15718279/721269). The standard permits the implementation to violate any assumptions you might make about behavior the standard doesn't define as observable. Assuming operations "really" take place in instruction order is an assumption about non-observable behavior that is false on most modern platforms but was assumed to be a safe, portable assumption at one time. (So much code made these kinds of assumptions that it has constrained CPU development!) – David Schwartz Jan 14 '20 at 03:24
@curiousguy At one time, it would have made perfect sense to say there is no way for any machine to provide memory operations that execute out of order. But machines do that routinely now. You are deliberately telling people to make the very kind of assumptions that have caused huge pain in the past. Can we please learn from our mistakes? The relevant standards tell you specifically how to get atomic operations. Why not follow them? – David Schwartz Jan 14 '20 at 03:26
@DavidSchwartz "_was assumed to be a safe, portable assumption at one time_" When was it assumed to be safe? – curiousguy Jan 14 '20 at 04:20
@curiousguy It was taught as a safe assumption until 1995 or so, though the bad coding practices it created persisted in the field well past the year 2000 causing lots of code to fail mysteriously and unpredictably when recompiled for Itanium CPUs. Even modern x86-64 CPUs have safeties to protect code that makes these kinds of assumptions (such as pegging speculative fetches to cache lines so that invalidating the cache line invalidates the fetch, which has high cost and when first made, could not affect code that isn't broken). – David Schwartz Jan 14 '20 at 07:05
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205915/discussion-between-curiousguy-and-david-schwartz). – curiousguy Jan 14 '20 at 07:40
@DavidSchwartz: In general, the stronger the semantic guarantees offered by a language/platform, the less work programmers will have to do to perform various tasks efficiently and reliably. I see no reason to blame programmers for the Committee's refusal to recognize that different platforms can offer different semantic guarantees, and that in many cases the most efficient way to perform many tasks on all platforms of interest will be to exploit guarantees that aren't supportable on all platforms. – supercat Feb 26 '20 at 19:30
@supercat You can't blame the committee for things that aren't standardized and that you don't think should be standardized. – David Schwartz Feb 26 '20 at 19:47
@DavidSchwartz: The stated intention of the Committee was to allow implementations extend the semantics of the language by defining behaviors in circumstances beyond those mandated by the Standard, and to allow conforming (but not *strictly* conforming) programs to exploit that. The authors of the Standard expressly wished to avoid being seen as to demeaning programs that achieved better performance than would be possible with strictly conforming code by exploiting such features. – supercat Feb 26 '20 at 20:09
@DavidSchwartz: Many purposes, especially anything involving interop with other languages, would be better served by a family of functions which--if defined--would perform real atomic operations on "ordinarily"-declared objects using a platform's natural defined means. If a single-core platform can e.g. perform non-sliceable 64-bit reads and writes, but cannot perform a 64-bit compare-and-swap, having real 64-bit atomic read/write operations that will interoperate with such actions in outside code, but not having 64-bit compare and swap would be better than having only phony... – supercat Feb 26 '20 at 20:15
...uselessly-emulated 64-bit atomic reads and writes, even if one gained phony 64-bit compare-and-swap in the bargain. If the Standard had required that *all* implementations at minimum support compiler reordering barriers, those would be sufficient to accomplish many tasks even on platforms that can't support everything in the standard atomics library, and programmers could be faulted for failing to use them as appropriate (almost any implementation could support such directives in compatible fashion by disabling optimization and defining the barrier as an empty macro). – supercat Feb 26 '20 at 20:18
@supercat There is no C/C++ std and the C++ std is not based on the C std, **so linking C and C++ is 100% UB**. It works. It's expected to work. The "it's UB so we took pleasure breaking your code" expected from the quasi-troll on GCC ML and bugtracker is 100% hypocritical! – curiousguy Feb 27 '20 at 01:20
@DavidSchwartz Are you the kind of ppl who claimed that 2compl or IEEE fp are (or were) not guaranteed hence non-portable? `export` was in the std, hence exported templates were portable? `cout<<"hello world";` has UB hence non-portable? I could go on for hours. Not formally defined <<>> (is extremely different from) non portable. – curiousguy Feb 27 '20 at 01:23
@curiousguy: A major beef I have with C11 atomics is that the Standard not only fails to mandate interop, but *requires* that some implementations behave in useless ways that make interop impossible. – supercat Feb 27 '20 at 02:29
@supercat Can you give examples? – curiousguy Feb 27 '20 at 13:09
1

@curiousguy: Suppose one is targeting the original 80386, and doesn't have to coexist with DMA but is not allowed to disable interrupts. Not that it's in current use, but less obscure than others the Committee worries about, and it's an architecture I'm familiar with. If one wants a function to decrement a `uint16_t` at an address and report whether it became zero, one could implement that easily in machine code as `pop edx / pop ebx / xor eax,eax / dec word [ebx] / jz wasZero / inc eax / wasZero: jmp [edx]`. Note that there's nothing special about the `uint16_t` involved. – supercat Feb 27 '20 at 13:36
1

@curiousguy: If one wanted a function that would atomically decrement a counter and report the new value (rather than just whether it became zero), however, one could no longer simply use a simple `uint16_t`, but would instead have to pair a `uint16_t` with some form of mutex to prevent simultaneous access, and *all operations on that object would have to go through the mutex*. If a counter of the first style needs to be shared between two pieces of code processed by different implementations (e.g. a program and a device driver), each could implement the "dec and report if became zero"... – supercat Feb 27 '20 at 13:42
1

...independently, without having to be aware of each other's existence. The variation with the mutex, however, would only work if everything that's going to access the counter is aware of the mutex and manages it in compatible fashion--something that's unlikely to occur. The C11 atomics, however, would require that a 16-bit atomic value be a horrible useless mutex monstrosity rather than a simple 16-bit unsigned integer that would be compatible with everything else in the universe. – supercat Feb 27 '20 at 13:46
@supercat: You're assuming a uniprocessor 386, I guess, not an SMP 386. (Or did 386 SMP run `dec word [ebx]` as an atomic RMW with respect to other CPUs?) I've read that 386 SMP had a sequentially-consistent memory model, unlike 486 and Pentium SMP systems, but I guess that just comes from not having store buffers (or cache?), not from making all RMWs into atomic RMWs. Anyway, you could simply use `lock dec word [ebx]` on such a 386 if you needed to. – Peter Cordes Oct 27 '22 at 15:33
So yes, your point still works that the C11 requirement to support a whole set of atomic operations in a lock-free way is cumbersome on systems that can do some but not all lock-free. Not sure how relevant it is in practice for non-retro systems. – Peter Cordes Oct 27 '22 at 15:33
1

@PeterCordes: The C Standard bends over backward to be adaptable to a wide range of platforms. Further, knowing whether a type supports 'lock free" operations is absurdly unhelpful. Far more important are whether (1) atomic objects are representation-compatible with ordinary objects, (2) the system has a global standard way of adjudicating access which is shared among all language implementations targeting it, and (3) whether operations are *non-blocking*. Many ARM platforms support atomic operations that are non-blocking and are likely be lock-free in practice, but... – supercat Oct 27 '22 at 16:09
1

...could not be *proven* to be lock-free by an implementation which doesn't have control over everything else in the system that might conflict. Further, most approaches that aren't non-blocking can generally be divided into approaches that can handle conflicts with signals, and those that handle conflicts with other threads. Many programs only need one or the other, and I don't think it's generally possible to handle both without using a non-blocking algorithm. – supercat Oct 27 '22 at 16:15
See https://stackoverflow.com/questions/71866535/which-types-on-a-64-bit-computer-are-naturally-atomic-in-gnu-c-and-gnu-c-m/71867102#71867102 for some examples where, even though simple load/store instructions are atomic on the hardware in question, the compiler chooses to access the variable by some other method which is not atomic. – Nate Eldredge Oct 27 '22 at 17:25
@DavidSchwartz "_What standard can I find that in?_" in what std can I find an (almost) intelligible, meaningful, non contradictory C/C++ MT semantic description? There isn't such a thing. (Sadly.) – curiousguy Nov 09 '22 at 00:41

Making data reads/writes atomic in C11 GCC using ?

4 Answers4