6

If I have atomic_bool flag;, how can I write C code to toggle it that's atomic, portable, and efficient? Regarding "efficient", I'd like it to assemble on x86_64 to lock xorb $1, flag(%rip). The "obvious" flag = !flag; is out because it isn't actually atomic. My next guess would be flag ^= true;, which assembled to this mess on GCC:

        movzbl  flag(%rip), %eax
0:
        movb    %al, -1(%rsp)
        xorl    $1, %eax
        movl    %eax, %edx
        movzbl  -1(%rsp), %eax
        lock cmpxchgb   %dl, flag(%rip)
        jne     0b

And this mess on Clang:

        movb    flag(%rip), %al
0:
        andb    $1, %al
        movl    %eax, %ecx
        xorb    $1, %cl
        lock            cmpxchgb        %cl, flag(%rip)
        jne     0b

Then I tried specifying a weaker memory order by doing atomic_fetch_xor_explicit(&flag, true, memory_order_acq_rel); instead. This does what I want on Clang, but GCC now completely fails to compile it with error: operand type '_Atomic atomic_bool *' {aka '_Atomic _Bool *'} is incompatible with argument 1 of '__atomic_fetch_xor'. Interestingly, if my type is an atomic_char instead of an atomic_bool, then both GCC and Clang emit the assembly that I want. Is there a way to do what I want with atomic_bool?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 4
    Note from standard: [7.17.7.5](https://port70.net/~nsz/c/c11/n1570.html#7.17.7.5) xor is not applicable to `atomic_bool`. (I wonder if the "None of these operations is applicable to atomic_bool" part from the standard suggest that compilers _should_ refuse to compile such code, making clang non-conformant here.) – KamilCuk Aug 17 '20 at 16:10
  • 2
    This is a deliberate choice in gcc. See [bugzilla 68966](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68966) and [bugzilla 68908](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908#c13). – P.P Aug 17 '20 at 16:22
  • 3
    Is there a reason you can't use `atomic_char flag;`? – chtz Aug 17 '20 at 16:27
  • @chtz I *can*, but it feels like an awful hack, and then it'd be much easier for someone to mistakenly put something other than 0 or 1 in it, which would end badly. – Joseph Sible-Reinstate Monica Aug 17 '20 at 16:30
  • Efficiency just doesn't have anything to do with the code generation. Count about 150 cycles for the lock and you'll rarely be disappointed. – Hans Passant Aug 17 '20 at 16:33
  • @JosephSible-ReinstateMonica gcc considers it not allowed per standard (even if there's a question about what "not applicable" C11, 7.17.7.5 means). You could instead use: `typedef unsigned char mybool_t;` and then you could use it with the gcc intrinsics. It's not ideal - having to create another "bool" type - but it could be a good enough workaround (At least, tThat's what I did when I had this problem in the past :) – P.P Aug 17 '20 at 16:37
  • 1
    @JosephSible-ReinstateMonica Using `_Bool` isn't any safer than using `char` when it comes to "someone setting it to a value other than 0 or 1", just use `atomic_char`, and if it is a global, then provide `set`/`get` functions that accept a bool and write to a char. If you really think that it is a hack, remember that C bool type is just a hacked in typedef for `_Bool`, whatever that is, which is a hack that is here simply to keep backward compatibility with code that may have used "bool" with assumption that it isn't a keyword, and there's no point in worrying about such things anyway. – Yamirui Aug 17 '20 at 17:23
  • 3
    @Yamirui `_Bool flag = 2;` won't actually store a 2, but `char flag = 2;` will, so I disagree that `_Bool` isn't any safer. – Joseph Sible-Reinstate Monica Aug 17 '20 at 17:31
  • 1
    @JosephSible-ReinstateMonica both are fundamentally wrong to do and it is a problem with the programmer doing it. Worrying about someone using `2` when there's no valid boolean value that can be mapped from `2` is like worrying about someone passing a `NULL` to api that clearly states it an undefined behaviour to do so. This is just how it is in C and it is too late to change that now. – Yamirui Aug 17 '20 at 20:17
  • 2
    @Yamirui: So you're arguing the `bool foo = return_zero_or_nonzero();` is an error, and you should have written `bool foo = (f() != 0);` to explicitly booleanize, rather than rely on implicit conversion to bool? That's one style choice, but with `unsigned char` the compiler isn't going to warn you if you get it wrong. – Peter Cordes Jun 02 '22 at 17:13
  • 1
    If compilers suck at `flag ^= 1;`, that's a missed optimization on their part, and should get reported (https://github.com/llvm/llvm-project/issues and https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc). If the return value is unused, yes, `lock xorb` is optimal. And if it is used, `lock btc $0, flag(%rip)`. – Peter Cordes Jun 02 '22 at 17:24
  • @PeterCordes Do the compilers actually suck at that, or is the mess they make necessary for that to be `memory_order_seq_cst` (which is what the standard requires for any operation where you don't specify an explicit one, even though I don't need it)? – Joseph Sible-Reinstate Monica Jun 02 '22 at 17:28
  • 1
    `lock xorb` is a full barrier and more than sufficient for a seq_cst RMW, just like `lock addl` is safe for `atomic_fetch_add`. (Or `lock xaddl` if the return value is used.) x86 can't do atomic RMWs with anything less than a full memory barrier, so `atomic_fetch_add_explicit` for a relaxed integer add only allows compile-time reordering, still `lock add` or `lock xadd` in the asm, same as seq_cst. See [The strong-ness of x86 store instruction wrt. SC-DRF?](https://stackoverflow.com/q/70249647) re: `xchg` or other locked insn being as strong as a full SC *fence*. – Peter Cordes Jun 02 '22 at 17:36
  • 2
    It's peculiar that the C standard does not require supporting `atomic_fetch_xor` and friends on `atomic_bool`, whereas AFAICT it does require supporting `^=`. So it really doesn't save anything for the implementation, except that it deprives the programmer of their choice of memory ordering. – Nate Eldredge Jun 06 '22 at 14:19
  • 1
    Anyway it seems clear from the comments that the answer is that you can't. From "atomic, portable, and efficient" you can pick two. There evidently isn't any portable way except for `flag ^= 1` (or equivalents like `flag -= 1`), and gcc optimizes them poorly. You can get the better-optimized version, at cost of portability, with `atomic_fetch_xor((atomic_uchar *)&flag, 1)`. Wrap in ifdef as needed. Do you want that in an answer, or are you holding out for a new idea out of left field? – Nate Eldredge Jun 06 '22 at 14:46
  • @NateEldredge Yeah, at this point I'd accept a summary of the comments as an answer. Hopefully it will be useful to help convince the compiler authors to add such a way. – Joseph Sible-Reinstate Monica Jun 07 '22 at 14:29
  • (Probably just changing what `^=` compiles to, as @PeterCordes mentioned.) – Joseph Sible-Reinstate Monica Jun 07 '22 at 14:30
  • 1
    Out of curiosity: what's the use of such an operation? I'm sure it's obvious but I can't figure it out :) It makes sense to `lock xor` a bitfield when different threads xor different bits. But if you let `N` threads atomically xor the same boolean flag won't the result just be `N % 2`? If that's the case, wouldn't be easier to use `atomic_add` and extract the LSb when needed? If the threads compete to set/reset the flag, wouldn't you need a form of synchronization to make sure that two threads trying to set the flag won't end up setting and resetting it? In that case you wouldnt need an atomic – Margaret Bloom Jun 07 '22 at 20:41

1 Answers1

3

Mainly summarizing comments:

It looks like the only portable way to atomically toggle your atomic_bool is flag ^= 1. But as you noted, gcc and clang don't know how to optimize it, and fall back to the cmpxchg loop. If you want full portability and compliance I think you just have to put up with that, until such time as they fix their missed optimization, which you might want to report.

In principle, another option should be flag -= 1 or flag += -1, which have the same truth table when nonzero values are treated as true. However, gcc compiles it to the same inefficient xor/cmpxchg code as flag ^= 1, and clang actually miscompiles it: when flag == 0, then flag -= 1 will set flag to 0xff which is invalid. It looks like this was reported several years ago but is still unfixed.

If you want a workaround, at least on x86 you should be able to do

atomic_fetch_xor((atomic_uchar *)&flag, 1);

I think it's okay for strict aliasing because atomic_uchar is a character type. In practice it is most likely fine anyway, because an atomic access shouldn't be optimized away. To be safe, check the assembly that is generated, or just go ahead and replace the whole thing with the appropriate one-liner of inline asm.


It's a nice touch that clang extends the atomic_fetch_* functions to work on atomic_bool, even though the C standard doesn't support it (7.17.7.5p1: "None of these operations is applicable to atomic_bool.") I don't really understand why the standards committee included that exception. All those operations still have to be available on atomic_bool via the compound assignment operators, so omitting them from atomic_fetch_* serves only to deprive the programmer of being able to use any weak memory ordering, without making life any easier for the implementation.

For similar reasons, I also don't understand why they didn't provide atomic_fetch_* for the remaining compound assignment operators. atomic_fetch_mul might not be that useful, but since *= has to work, it shouldn't cost the implementation anything to speak of, and the consistency would be nice.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • 1
    [Maragaret Bloom commented](https://stackoverflow.com/questions/63454557/efficient-way-to-toggle-an-atomic-bool/72573939#comment128136536_63454557) that for most use-cases, `atomic` with `fetch_add` should wok just as well: increment or decrement flips the low bit, and readers can test that cheaply. This even allows x86 `lock xadd` to do the fetch part of fetch_add. (Although with `lock xor` on a bool you also find out the old value via ZF, so `atomic_bool` for `bool tmp = (flag ^= 1);` could still compile efficiently without a CAS loop, just `lock xor`.) – Peter Cordes Jun 10 '22 at 20:28