Best way to atomically bitwise AND a byte in C/C++?

Question

Currently looking at atomic operations in C/C++ using GCC and discovered that naturally aligned global variables in memory have atomic reads and writes.

However, I was trying to bitwise AND a global variable and noticed it boils down to a read-modify-write sequence which is troublesome if there are multiple threads operating on that byte value.

After some research, I've settled on these two examples:

C Example - GCC extension __sync_fetch_and_and

#include <stdio.h>
#include <stdint.h>

uint8_t byteC = 0xFF;

int main() {
    __sync_fetch_and_and(&byteC, 0xF0);
    printf("Value of byteC: 0x%X\n", byteC);
    return 0;
}

C++ Example - C++11 using atomic fetch_and

#include <iostream>
#include <atomic>

std::atomic<uint8_t> byteCpp(0xFF);

int main() {
    byteCpp.fetch_and(0xF0);
    std::cout << "Value of byteCpp: 0x" << std::hex << static_cast<int>(byteCpp.load()) << std::endl;
    return 0;
}

Other examples follow but they seemed less intuitive and more computationally expensive.

Using a pthread_mutex_lock

uint8_t byte = 0xFF;
pthread_mutex_t byte_mutex = PTHREAD_MUTEX_INITIALIZER;

pthread_mutex_lock(&byte_mutex);
byte &= 0xF0;
pthread_mutex_unlock(&byte_mutex);

Using a mutex lock_guard

#include <mutex>

uint8_t byte;
std::mutex byte_mutex;

void atomic_and() {
    std::lock_guard<std::mutex> lock(byte_mutex);
    byte &= 0xF0;
}

Using a compare_exchange_weak

std::atomic<uint8_t> byte;

void atomic_and() {
    uint8_t old_val, new_val;
    do {
        old_val = byte.load();
        new_val = old_val & 0xF0;
    } while (!byte.compare_exchange_weak(old_val, new_val));
}

Question

What's the best atomic method for a read-modify-write sequence in a multithreaded C/C++ program?

"naturally aligned global variables in memory have atomic reads and writes" - ehh, what? Not as far as I know. — Jesper Juhl, Aug 18 '23 at 17:56
You seem to be asking two different questions, one for C and one for C++. The answer is likely to be different in each case. Choose which language you are asking about and post a separate question for the other language if you really need an answer for both. — François Andrieux, Aug 18 '23 at 17:57
@JesperJuhl: *naturally aligned global variables in memory have atomic reads and writes* - in assembly language yes for small power of 2 sizes on most ISAs, e.g. [x86](https://stackoverflow.com/q/36624881/224132). But this isn't assembly language, so your objection is valid. "Atomic" in C and C++ also means stopping the optimizer from removing or reordering operations. Also, you definitely don't get RMW atomicity for free! An atomic pure load and then a later atomic store of a different value is very much not an atomic RMW, that's why x86 has instructions like `lock and byte [mem], reg`. — Peter Cordes, Aug 18 '23 at 18:00
Is this true or not? The guarantee on Intel is that a single aligned 32-bit memory write will happen atomically: i.e., if you write an int to memory you don’t have to worry that another thread will see some of the bits but not all. — vengy, Aug 18 '23 at 18:02
`__sync_fetch_and_and` is a compiler intrinsic, so the part of the question is not about C but is rather about gcc. — 273K, Aug 18 '23 at 18:02
@JesperJuhl: Perhaps the OP is referring to having read the comment thread on [Does the value of \`std::memory\_ordering\` affect both compiler reordering and hardware instructions on atomic objects?](//stackoverflow.com/posts/comments/135597989) yesterday, where the fact that access to plain non-atomic variables compiles to the same asm instructions (like x86 `mov`) as `relaxed` atomics. That does *not* mean that plain C variables *are* atomic in any sense. See [LWN: Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) re: pitfalls of plain vars in Linux kernel. — Peter Cordes, Aug 18 '23 at 18:05
@vengy - A single write happens atomically, if it happens. But for a non-atomic varaible, the compiler might keep the value in a register, and perhaps write it later. For an AND operation you are just lost as it is both a read and write. Why not trust `std::atomic` to use the best possble way to perform the operation? — BoP, Aug 18 '23 at 18:06
@BoP Thanks. I'll stick with std::atomic as it's the least confusing for me. I actually thought writing a value to a global such as `byte = 0` would be an atomic operation. — vengy, Aug 18 '23 at 18:11
@vengy: “The guarantee on Intel is that a single aligned 32-bit memory write will happen atomically”: Did the compiler guarantee to you that if you assign a value to a 32-bit object in C source code, it will implement that with a single aligned 32-bit memory write? — Eric Postpischil, Aug 18 '23 at 18:13
If you did want to use GNU C builtins, don't use the obsolete `__sync` builtins. Use `__atomic_fetch_and(&byte, 0xf0, __ATOMIC_RELAXED)` or whatever (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html) which also provide atomic pure-load and atomic pure-store, instead of having to hack with `*(volatile char*)` to force a load or store to happen. — Peter Cordes, Aug 18 '23 at 18:13
Re: "the guarantee on Intel ... " see [Why is integer assignment on a naturally aligned variable atomic on x86?](https://stackoverflow.com/q/36624881) and the section at the start about how C and C++ are different from asm, since there's an optimizer that can keep values in registers, among other code transformations. Also [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](https://stackoverflow.com/q/71866535) - none. — Peter Cordes, Aug 18 '23 at 18:16
@JesperJuhl "_x86 is not the only architecture that C++ supports_" But the remark that read and write operations are atomic under certain conditions is true of any reasonable arch that C++ supports. — curiousguy, Aug 21 '23 at 20:38

Jan Schultke · Accepted Answer · 2023-08-18T18:34:32.193

7

[I have] discovered that naturally aligned global variables in memory have atomic reads and writes.

This is not correct in a C/C++ sense, only in an x86_64 sense. It is true that any aligned loads and stores on x86_64 are atomic, but that isn't correct for the abstract machine. Writing to a non-atomic bit of memory concurrently is always a data race, and thread sanitizers might catch the mistake, even if the architecture theoretically makes it safe.

Furthermore, the best way to do byte &= 0xf0 atomically is very similar in C and C++:

// C++
#include <atomic>
std::atomic_uint8_t byte; // or std::atomic<std::uint8_t>
// ...
std::uint8_t old = byte.fetch_and(0xf0); /* optionally specify memory order */
// or
std::uint8_t old = std::atomic_fetch_and(&byte, 0xf0);

// C (no compiler extensions/intrinsics needed)
#include <stdatomic.h>
atomic_uint8_t byte; // or _Atomic uint8_t
// ...
uint8_t old = atomic_fetch_and(&byte, 0xf0); /* optionally atomic_fetch_and_explicit */

The other methods (POSIX threads, std::mutex, compare_exchange retry loop) are almost certainly worse than the built-in way in the form of fetch_and functions. If the architecture doesn't directly provide an atomic fetch-AND instruction, then whichever way is best should be chosen. It's not something you have to worry about.

2

It's true in assembly on most ISAs, not just x86-64. That's why `std::atomic` `.load(relaxed)/`.store(relaxed)` just compile to plain load and store instructions, the same ones compilers use for plain variables, for T the width of a register or less on most ISAs. But yes, totally flawed reasoning, being atomic *in C or C++* also means keeping the optimizer's hands off. See [LWN: Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) re: pitfalls of plain vars in Linux kernel (where they use `volatile` with GCC and known types, and `asm` for stronger ordering). – Peter Cordes Aug 18 '23 at 18:11
A coworker asked me about read/writes to a global variable in C++, so my initial reaction was to use std::atomic, but other people said that assigning an int on an aligned variable was atomic. Anyhow, thanks for all the replies. I have much to learn... – vengy Aug 18 '23 at 18:32
2

@vengy: Those people are super duper wrong. Use `std::atomic` with `std::memory_order_relaxed` if all you need is atomicity (and for the optimizer to know that another read might get a new value), not ordering wrt. any other operations. This is like what `volatile int` gives you (compiling to about the same asm with real compilers), except it's well-defined by ISO C and ISO C++. [When to use volatile with multi threading?](//stackoverflow.com/a/58535118) (never unless you know exactly what you're doing, e.g. to micro-optimize something like `atomic` where GCC code-gen sucks) – Peter Cordes Aug 18 '23 at 23:06
1

And of course `volatile` or plain variables won't give you RMW atomicity; the Linux kernel rolls its own atomics using inline asm for that. But even for load/store atomicity, see [Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) for some of the more obscure things that can go wrong when using non-atomic non-volatile variables with memory barriers (or non-inline function boundaries) to force memory accesses. – Peter Cordes Aug 18 '23 at 23:09

Best way to atomically bitwise AND a byte in C/C++?

1 Answers1

See Also