Safety when accessing an integer variable: 1 writer, N readers

Question

I've a "static 64-bit integer variable" that gets updated by only one thread.

All other threads only read from it.

Should i protect this variable using atomic operation (ex. "__sync_add_and_fetch") for safety reasons?

Or is it OK to read(resp. write) from (resp. to) it directly?

I'm still confused because i didn't find a clear answer. I don't know if 've to protect it:

only when writing to it
for both writing and reading (__sync_add_and_fetch(V, 0))
no need to protect it at all

Peter Cordes · Answer 1 · 2023-07-29T20:31:41.673

GCC's __sync builtins are obsoleted by its __atomic builtins.

Both reader and writer should be using __atomic operations, like __atomic_load_n and __atomic_store_n (you don't need or want an expensive atomic RMW since there's only one writer.) With __ATOMIC_RELAXED, load and store are as cheap as plain operations (which can't get optimized away into registers.) Normal accesses (not via __atomic builtins) to a plain non-volatile non-_Atomic are never ok with concurrent reads + writes, and will break in practice, e.g. hoisting a load out of a spin-wait loop.

But in new code, you should normally use C11 stdatomic.h with _Atomic types (https://en.cppreference.com/w/c/thread), and the macros available for testing if a certain atomic type is lock-free on the target you're compiling for.

(With the old __sync builtins, I think the design intent was that you'd use volatile for pure-load and pure-store, since they didn't provide __sync builtins for that. Hand-rolling your own atomics with volatile works on normal ISAs with normal compilers like GCC and clang, and the Linux kernel depends on it, but it's not recommended when you can get the job done with C11 stdatomic.h.)

Lock-free 64-bit atomics

On systems where #if ATOMIC_LLONG_LOCK_FREE > 0 is true, use relaxed (or acq/rel) atomics, depending on what ordering guarantees you need (like if the writer is using the counter to "publish" other data to readers, e.g. if the counter is used as an index into a non-atomic array).

#include <stdatomic.h>
#include <stdint.h>

#if ATOMIC_LLONG_LOCK_FREE > 0

static atomic_uint_fast64_t counter;

// make sure this can inline into readers
uint64_t read_counter() {
   return atomic_load_explicit(&counter, memory_order_relaxed);  // or m_o_acquire
}
void increment_counter_single_writer() { // one thread only
   uint64_t tmp = atomic_load_explicit(&counter, memory_order_relaxed);
   // or keep a local copy of the counter in a register and *just* store.
   // other threads just see the values we store, doesn't matter how we get them
   atomic_store_explicit(&counter, tmp+1, memory_order_relaxed);  // or m_o_release
}
// with multiple writers, use  atomic_fetch_add_explicit


#else
static uint64_t counter;
static _Atomic unsigned seq;
uint64_t read_counter() { ... }
#endif

Note that some 32-bit systems can do lock-free 64-bit atomics, such as x86 (since P5 Pentium) and some ARM32. See how this compiles on Godbolt with clang for x86 and ARM Cortex-A8 (to pick a random ARM that's not recent).

Otherwise probably a SeqLock

Otherwise, without lock-free 64-bit atomics, use a SeqLock if the counter doesn't increment too often. (See Implementing 64 bit atomic counter with 32 bit atomics). This still lets the readers be truly read-only so they don't contend with each other for cache lines, they only have to retry if they tried to read while the writer was in the middle of an update. (After the writer is done, the cache line containing the counter can be in Shared state on all cores running reader threads, so they can all get hits in cache.)

For a monotonic counter, the halves of the counter itself can work as a sequence number to detect tearing. Like read the low half before and after reading the high half, retry if different.

A readers/writers lock would force readers to contend with each other to modify the cache line holding the lock, so the total throughput doesn't scale with number of readers. If you have a very-frequently modified counter (so a seqlock would often be in an inconsistent state), you might consider something more clever, like a queue of recent values so readers could check the most recent consistent value or something?

BTW, ATOMIC_LLONG_LOCK_FREE > 0 seems appropriate: ATOMIC_LLONG_LOCK_FREE == 1 means "sometimes lock-free", but in practice there aren't implementations where some objects are lock-free and some aren't. And if there were, we'd hope that a compiler could arrange for a loose global/static variable to be aligned such that it's atomic.

score -1 · Accepted Answer · answered Apr 27 '15 at 02:34

-1

Usually, the atomicity of reading and writing a storage location is the same.

That is to say, if a location cannot be written atomically, it also cannot be read atomically and vice versa.

If a special atomic write is required, it is meaningless to use it unless the reads are also atomic.

For instance, imagine that the 64 bit location is being read using an ordinary read which requires, say, two 32 bit accesses. Suppose the write comes between those two accesses. The read will fetch the second 32 bits from the new value and combine it with the outdated first 32 bits. The only way the write cannot come between the two halves of the read is if the read is atomic. The atomic read knows how to properly interact with atomic write to prevent this situation.

You might be confused by "exceptions" to this rule. In some systems you might see an atomic update such as an increment, mixed with ordinary reads. This is predicated on the assumption that reads and writes are in fact atomic; the special atomic increment is only used to make read/modify/write cycle appear indivisible from the perspective of concurrent writers: if N writers perform this increment at around the same time, it ensures that the location is incremented by N.

Sometimes you might also see correct optimizations whereby an ordinary read is used, even though the underlying data type isn't atomically accessed. In such a situation, the algorithm doesn't care that it reads a "half baked" value.

For instance, in order to simply monitor a memory location to detect a change, you do not require an atomic read. Detecting a change doesn't require retrieval of the correct value. For instance of 0x00000000 is updated to 0x00010001, but the non-atomic read observes the intermediate value 0x00010000, that is still good enough for detecting that the location has changed.

If you must ensure that readers never see a half-baked value, then use atomic reads and writes.

There are other issues, like ordering. Suppose that a writer updates two locations, A and B. In some computing systems, it is possible for a reader to observe the update of B before A. Special "barrier" or "fence" instructions have to be used in addition to any atomic update instructions.

In a higher-level language, the API for these barriers may be built into the semantics of some atomic operations, so you may end up using the atomic instructions just for the sake of these barriers, even if the datum is otherwise atomic.

answered Apr 27 '15 at 02:34

Kaz

55,781
9
100
149

One aspect of multithreading is the problem that C's standard storage qualifiers have no idea of threads. Therefore, the compiler may easily optimize-out writes to the shared variables. It may also remove the reads from memory. This can be circumvented by volatile, which is required anyway. For the example given with 0x00010001, that depends. It is sufficient to detect a change, but you do not use an integer variable just for that, but to get a valud value. Unless you use gray code counters, you _do_ need atomic read and write for that. If not, leave it to the compiler; it should know best. – too honest for this site Apr 27 '15 at 02:58
@Olaf writes "It is sufficient to detect a change, but you do not use an integer variable just for that, but to get a valud value". Right, so in the one situation a regular read could be used and in the other an atomic read. – Kaz Apr 27 '15 at 03:14
@Olaf writes "Therefore, the compiler may easily optimize-out writes to the shared variables. It may also remove the reads from memory". Indeed; you must use a compiler which supports concurrent programming: it honors things like `volatile`, generates reentrant function linkage and such. – Kaz Apr 27 '15 at 03:16
@Kaz: how about declaring my variable: static volatile uint64_t z = 0; I'm using gcc version 4.9.2. – user2323324 Apr 27 '15 at 05:20
@Kaz: found an Intel article about [Volatile: Almost Useless for Multi-Threaded Programming](https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/) Not sure if this keyword is still relevant nowdays (unless for special cases). – user2323324 Apr 27 '15 at 05:30
Please guys, remember that i've only **01 writer**, and **N** (_N_ is large) readers. As put in the title, my code is in **C**, not _C++_ – user2323324 Apr 27 '15 at 05:33
@user2323324: You should not add a C++ tag for a C question. – too honest for this site Apr 27 '15 at 11:58
@Kaz: It makes no sense to use one, then the other. That is more complicated than using a simple atomic read once. Also, your approach does not guarantee the value changed again in-between detecting the change and reading the actual value. Both have to be ... atomic. And that is exactly what the instructions used by stdatomic do. – too honest for this site Apr 27 '15 at 12:09
@Kaz: `volatile` has noting to do with concurrent programming support. It is a mandatory part of the standard and supported by any compiler compliant to at least C90 (never checked ANSI-C, but quite sure that also defines it). The qualifier is required if you access hardware registers, too, Regarding reentrancy: that is more a matter of the program code in C than of the compiler. This may be different in C++, but apparently we are talking about C. C11 added thread-support, but that still requires additions to the source code. – too honest for this site Apr 27 '15 at 12:13
@user2323324: volatile is mostly for a single CPU. It also does not guarantee atomicity, nor does it avoid reordering of non-volatile accesses _by_the_compiler_(!) before or after the volatile access (but for two volatile accesses). For the hardware, that depends on the type of memory region. There are very well regions which guarnetee in-order accesses (how else to access hardware registers?). That's why I recommend using stdatomic, which does exactly address these issues. If using gcc 4.9.2, you already have it available, so why so much struggle to use it? – too honest for this site Apr 27 '15 at 12:26
@user2323324 `volatile` only ensures that an access is not optimized away, or reordered (at the software level) with regard to other `volatile` accesses. It certainly doesn't ensure atomicity. 1990 ISO C describes a `sig_atomic_t` type. The purpose is that mainline code which shares a variable with an asynchronous signal handler should make it `volatile sig_atomic_t`, not just a `volatile`-qualified version of any type. – Kaz Apr 27 '15 at 14:03

score -2 · Answer 3 · answered Apr 26 '15 at 23:18

As a simple answer, I would advise you to protect it (no its not okay to write directly) only when you write to it.

Though you know that if you are doing something like that and threads are not synced you might get different results everytime.

PS. i wanted this to be a comment but my reps too low

score -2 · Answer 4 · answered Apr 27 '15 at 03:16

-2

Yes. You need a reader-writer lock. That's exactly what they do. Blocks while writing and other than that readers can read at their pleasure

If you're using boost I believe it's boost::shared_mutex

C++ doesn't currently support reader-writer locks though you could implement them yourself.

For more information on reader-writer locks look here

answered Apr 27 '15 at 03:16

GDub

544
2
7
15

That would be like taking a sledgehammer to crack a nut. And, also we are talking about C, not C++ (at least your answer should work on both. – too honest for this site Apr 27 '15 at 12:45
You don't *need* a lock; you can use types like `atomic_int` (`_Atomic int`) and functions from ``. (Like `atomic_store_explicit` with `memory_order_release` in writers, and `atomic_load_explicit` with `memory_order_acquire` in the readers). – Peter Cordes Jul 29 '23 at 10:23

Tsyvarev · Answer 5 · 2023-07-29T20:18:45.820

-3

For 64bit target architecture, which has native instructions for read/write 64-bit value as whole, you need no protection like mutexes or seq locks. But some precautions should be done for force compiler to use 64-bit access instructions and for avoid deferring memory accesses when write or read the variable. If the code is compiled with C11 standard or later, atomics (with the relaxed memory order) will be the best choice: they provide all above guarantees. For older compilers volatile specifier for the variable could provide those guarantees in most cases.

Otherwise, for 32-bit architectures, you should use some sort of critical section around write in the updater thread and around reads in the reader threads. It may be a mutex, a read-write lock, a seq-lock, etc.

(There are also 32-bit architectures which support special instructions for 64-bit access. For those targets the solution could be architecture-specific.)

edited Jul 29 '23 at 20:18

answered Apr 27 '15 at 08:31

Tsyvarev

60,011
17
110
153

`volatile` is not optional! And it does not prevent reading twice, but instructs the compiler the variable may be modified outside the normal (sequential) program flow. Therefore, if read twice, it will _eventually_ be read twice (and in the programed order). You do not need to detect anything; that's what stdatomic is for. Remember: Current standard version is C11, not C90 anymore! A 64 bit machine may still perform non-atomic read/write with respect to multiprogessing if not aligned properly. – too honest for this site Apr 27 '15 at 12:04
@Olaf: In some cases 'volatile' is actually optional, but yes, it is better to use it. As for 'stdatomic' library, what atomic_store() will do, if 64-bit type has no native atomic instruction on the target arch? Will it be a compiling error? Or will compiler use some locking mechanism? – Tsyvarev Apr 27 '15 at 13:43
`volatile` is always optional; the compiler may even ignore it (but I don't know any which actually does). However, some applications actually do require it. For the question, it is mandatory, but not sufficient; other constraints have to be met, too: atomicity for read/write, possibly fences/barriers. Regarding stdatomic: you should have a look at actual implementaions. depending on the CPU, it uses CAS or exclusive load/store instructions, depending on the memory model specified with the function. For ARM-V7(AMR) and (afaik) x86, the latter is used. ... – too honest for this site Apr 27 '15 at 15:11
... This is based on restart of the operation if the target changed. Other architectures may use locked instructions which lock the bus from being used by other masters. Some may just need nothing special, as they only have a single CPU and us a single instruction for e.g. add. For a simple read/write to aligned memory, there might even be no special instruction, but a memory barrier before or afterwards (again: depending on the memory model given). Oh, and if there can be no guarantee at all, the corresponding function is simply not available, so: yes you'll get an error from the compiler. – too honest for this site Apr 27 '15 at 15:18
1

Adding a lock will just result in unnecessary latency. The question description makes it likely that gcc is used and that there's a willingness to use compiler builting, so there's absolutely no need to use locking at all; just use [`__atomic_store` and `__atomic_load`](https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html) (`__atomic_add_fetch ` is actually not necessary, since the thread modifying the variable can keep a local copy of the variable to work with, no need for a operation that's a combined read+write.) – fabian Jul 29 '23 at 08:05
*If the code is compiled with C11 or later standards, atomics with the relaxed memory order will be the best choice.* This newly-added part is correct, but you forgot to remove all the wrong implications that it might be ok to use plain `uint64_t`. Even `volatile uint64_t` isn't guaranteed safe, and won't be on 32-bit machines. (In that case, use a SeqLock for publishing increments, especially if they're less frequent than reads: [Implementing 64 bit atomic counter with 32 bit atomics](https://stackoverflow.com/q/54611003)) – Peter Cordes Jul 29 '23 at 19:21
@PeterCordes: "all the wrong implications that it might be ok to use plain uint64_t" - With pre-C11 compiler one hardly has better choice. "Even volatile uint64_t isn't guaranteed safe, and won't be on 32-bit machines." - The whole first paragraph is about machines which supports instructions for write 64bit value. 32-bit machines definitely have no such support. – Tsyvarev Jul 29 '23 at 19:52
In the bad old days pre C11, `volatile uint64_t` worked on systems where it was "naturally" atomic. Plain `uint64_t` doesn't work with optimization enabled. (See also [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/q/71866535) - even when not optimizing away a load or store, GCC writes `uint64_t` in two halves in some cases on AArch64, but not if you use `volatile`.) If you want to say something about those bad old days, you should say something stronger than "it is better". – Peter Cordes Jul 29 '23 at 19:58
So `volatile` is necessary if you're going to roll your own atomics (and does work in practice, see [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) - it gives you something like `memory_order_relaxed` unless you use inline asm or something for memory barriers.) – Peter Cordes Jul 29 '23 at 19:59
*32-bit machines definitely have no such support.* - Some 32-bit ISAs have load-pair / store-pair instructions. That's how 32-bit ARM implements 64-bit lock-free atomics. (32-bit x86 uses x87 `fild` / `fistp`, or if available, MMX or SSE to do 64-bit loads. Those are guaranteed atomic on P5 Pentium and later.) See https://godbolt.org/z/4W5q1daKG for x86-32 and ARM32. But yeah, you won't get a compiler to generate those with `volatile`, that's why you should use `_Atomic`. – Peter Cordes Jul 29 '23 at 20:06
So yes, a 64-bit machine will have lock-free 64-bit atomics, but you're missing out some 32-bit machines that can also do that. Also, describing ancient pre-C11 techniques hardly seems like a good answer in 2015, and definitely not a good answer in 2023 for the real question of what you should actually do. (I posted an answer myself showing how to use C11 `atomic` functions.) – Peter Cordes Jul 29 '23 at 20:09
1

Well, it seems I had in mind some special meaning of "no protection" words when wrote that answer 8 years ago. I rephrase the answer for make this meaning easy to understand. Hopefully, this makes the answer clearer. Thanks to @Peter Corders. – Tsyvarev Jul 29 '23 at 20:24
Yup, big improvement. Upvoted. – Peter Cordes Jul 29 '23 at 20:27
@PeterCordes: "Also, describing ancient pre-C11 techniques hardly seems like a good answer in 2015, and definitely not a good answer in 2023 for the real question of what you should actually do" - At some stage of my current work knowing pre-C11 aspects of a concurrent code was actually useful. And describing pre-C11 aspects after C11 ones hardly cause any harm. – Tsyvarev Jul 29 '23 at 20:32
1

Ok yes, I said that wrong. It's not bad to *describe* the bad old days; I did the same thing at the top of my answer since the question brought up the subject of legacy `__sync` builtins and how to do pure loads from other threads on variables you modified with `__sync_add_and_fetch`. It's only a problem when the answer omits (or doesn't emphasize enough) the current best way, C11 stdatomic.h. Your edit fixed that problem. – Peter Cordes Jul 29 '23 at 22:01

Safety when accessing an integer variable: 1 writer, N readers

5 Answers5

Lock-free 64-bit atomics

Otherwise probably a SeqLock