Why does stdatomic.h contain atomic_uint_least16_t and atomic_uint_fast16_t but not atomic_uint16_t?

Question

stdatomic.h appears to contain atomic_uint_least16_t and atomic_uint_fast16_t, which are _Atomic versions of the stdint.h types uint_least16_t and uint_fast16_t, but it does not contain atomic_uint16_t. Why?

For some background information from the N1548 draft:

7.18.1.1 Exact-width integer types

1 The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

2 The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

3 These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.

7.18.1.2 Minimum-width integer types

1 The typedef name int_leastN_t designates a signed integer type with a width of at least N, such that no signed integer type with lesser size has at least the specified width. Thus, int_least32_t denotes a signed integer type with a width of at least 32 bits.

2 The typedef name uint_leastN_t designates an unsigned integer type with a width of at least N, such that no unsigned integer type with lesser size has at least the specified width. Thus, uint_least16_t denotes an unsigned integer type with a width of at least 16 bits.

3 The following types are required:
int_least8_t
int_least16_t
int_least32_t
int_least64_t
uint_least8_t
uint_least16_t
uint_least32_t
uint_least64_t
All other types of this form are optional.

(and so on, to include the int_fastN_t / uint_fastN_t types, etc.)

It is worth highlighting in paragraph 3:

However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.

This means that if, for example, I have a type like int or short which is implemented as a 16-bit integer with two's complement representation, then the implementation shall define int16_t.

The atomic_ types for <stdatomic.h> are also listed in N1548 (reproduced below) but it does not make a corresponding requirement that if the implementation has a int16_t then there is a atomic_int16_t --- that is the nature of my question.

7.17.6 Atomic integer and address types

1 For each line in the following table, the atomic type name is declared as the corresponding direct type.

Atomic type name         Direct type
----------------         -----------
atomic_char              _Atomic char
atomic_schar             _Atomic signed char
atomic_uchar             _Atomic unsigned char
atomic_short             _Atomic short
atomic_ushort            _Atomic unsigned short
atomic_int               _Atomic int
atomic_uint              _Atomic unsigned int
atomic_long              _Atomic long
atomic_ulong             _Atomic unsigned long
atomic_llong             _Atomic long long
atomic_ullong            _Atomic unsigned long long
atomic_char16_t          _Atomic char16_t
atomic_char32_t          _Atomic char32_t
atomic_wchar_t           _Atomic wchar_t
atomic_int_least8_t      _Atomic int_least8_t
atomic_uint_least8_t     _Atomic uint_least8_t
atomic_int_least16_t     _Atomic int_least16_t
atomic_uint_least16_t    _Atomic uint_least16_t
atomic_int_least32_t     _Atomic int_least32_t
atomic_uint_least32_t    _Atomic uint_least32_t
atomic_int_least64_t     _Atomic int_least64_t
atomic_uint_least64_t    _Atomic uint_least64_t
atomic_int_fast8_t       _Atomic int_fast8_t
atomic_uint_fast8_t      _Atomic uint_fast8_t
atomic_int_fast16_t      _Atomic int_fast16_t
atomic_uint_fast16_t     _Atomic uint_fast16_t
atomic_int_fast32_t      _Atomic int_fast32_t
atomic_uint_fast32_t     _Atomic uint_fast32_t
atomic_int_fast64_t      _Atomic int_fast64_t
atomic_uint_fast64_t     _Atomic uint_fast64_t
atomic_intptr_t          _Atomic intptr_t
atomic_uintptr_t         _Atomic uintptr_t
atomic_size_t            _Atomic size_t
atomic_ptrdiff_t         _Atomic ptrdiff_t
atomic_intmax_t          _Atomic intmax_t
atomic_uintmax_t         _Atomic uintmax_t

2 The semantics of the operations on these types are defined in 7.17.7.

3 The atomic_bool type provides an atomic boolean.

4 The atomic_address type provides atomic void * operations. The unit of addition/subtraction shall be one byte.

5 NOTE The representation of atomic integer and address types need not have the same size as their corresponding regular types. They should have the same size whenever possible, as it eases effort required to port existing code.

score 5 · Answer 1 · answered Jul 12 '20 at 18:28

5

This list of specialized atomic types are only there because of a historic accident where they were meant to ensure compatibility with C++. And, they were only meant to provide interfaces for integer types that are mandatory. None of the uintXX_t types are mandatory, and therefore they are not included.

(That goal was immediately watered by adding atomic_[u]intprt_t where [u]intptr_t are not mandatory, but that is probably yet another story.)

answered Jul 12 '20 at 18:28

Jens Gustedt

76,821
6
102
177

so `uint_least16_t` and `uint_fast16_t` are required but `uint16_t` is not required?! – Jason S Jul 13 '20 at 17:55
Yes, exactly. The fixed width type aliases are only required if the platform has an integer type of exactly the corresponding width. – Jens Gustedt Jul 13 '20 at 18:32
What I don't understand is why the atomic types don't mirror the stdint.h types -- e.g. if `uint32_t` is required, why not `atomic_uint32_t`? – Jason S Jul 13 '20 at 20:03
(edited my question with some citations from draft standard N1548) – Jason S Jul 13 '20 at 20:40
`uint32_t` is not required in general, only if the platform has an integer type with exactly 32 value bits and no padding bits. – Jens Gustedt Jul 13 '20 at 21:22
...which many embedded processors do, along with `uint8_t` and `uint16_t`. – Jason S Jul 14 '20 at 02:33
And I understand `uint32_t` is not required in general; what I mean is that **if** `uint32_t` is required because of 7.18.1.1 paragraph 3, why is `atomic_uint32_t` not **also** required? – Jason S Jul 14 '20 at 02:35
Because nobody cared enough, I guess. There is not much reason to use any of these types. Use `_Atomic(uint32_t)` instead. – Jens Gustedt Jul 14 '20 at 06:01

Mecki · Answer 2 · 2023-07-07T15:28:23.340

1

Because a platform may not be able to handle uint16_t in an atomic way. If a platform has no native uint16_t type, a compiler can still emulate that type on top of uint32_t but such an emulated type will never be atomic.

Note that all exact width types are optional to begin with. The C standard only requires uint_least16_t and uint_fast16_t to exist. Both guarantee to have at least 16 bits but they may have more than 16 bits. The difference is that the first one is optimized for space (use as little memory as possible, even if that will be slow) and the second one for performance (use the fastest type available, even if that requires lots of memory).

A compiler may offer uint16_t if such a native type is available on the platform or the compiler wants to emulate it but it is never required to do so. Code that should be compilable with every compiler following the standard, must not rely on uint16_t to exist in the first place.

The POSIX standard requires uint16_t to exist, so for POSIX platforms, the compiler must emulate that type is not natively available, but the POSIX platform does not require any type to be atomic at all.

edited Jul 07 '23 at 15:28

answered Jul 07 '23 at 12:55

Mecki

125,244
33
244
253

*Standard conform code* - I think you mean "fully portable code", i.e. code which will run on any ISO C implementation. I wouldn't say that code using `uint16_t` isn't standards-conforming, just that it won't run on some obscure implementations. (Assuming it avoids UB on any implementation that can compile + run it, by not making any assumptions that aren't guaranteed by the standard beyond the existence of the language features it uses.) – Peter Cordes Jul 07 '23 at 13:28
If the target hardware has atomic RMW of aligned 32-bit integers but not 16-bit, it *could* use that to support `_Atomic uint16_t`. (Doing stores and RMWs with an atomic RMW such as a CAS retry loop to replace the low or high half of the containing word, so performance could be bad. So it might be more accurate to say that it can't *efficiently* do `_Atomic uint16_t`) Anyway, the existence of plain `uint16_t` would allow you to use `_Atomic uint16_t`, so the lack of a typedef for it isn't significant on normal implementations (which do have `uint16_t`.) – Peter Cordes Jul 07 '23 at 13:31
Oh, just saw your answer on [Do we have atomic uint32 type in C?](https://stackoverflow.com/a/76637079) - good point that `sizeof(_Atomic uint16_t)` could be larger than `sizeof(uint16_t)`, so it could use some width that the target supports for atomic ops on a weird machine without 16-bit atomic load/store/RMW (like early DEC Alpha). `atomic_uint16_t` if it existed would be the same way, so IDK why C didn't choose to have implementations provide `atomic_uint16_t` on implementation that provide `uint16_t`. – Peter Cordes Jul 07 '23 at 13:37
@PeterCordes Standard conform may be a bit harsh, as the standard surely does allow exact types, but in the end it boils down to "My code follows ISO-C xx, this compiler is fully ISO-C xx compliant, so I expect that compiler to compile my code" and that's of course only guaranteed if you stick to mandatory features. Everything else is a bit in the realm of "maybe, might, who knows" and for that realm, I usually don't require a standard, as that's what I get without one and that's what standards try to avoid by providing reliability. – Mecki Jul 07 '23 at 15:27
Yes, that's what I thought you meant. Better phrasing in your edit. But this answer seems to be explaining why `uint16_t` is optional. Since `_Atomic uint16_t` is allowed to have a different size, any implementation supporting an equal or wider atomic type can implement it, using extra ALU instructions to wrap it to 16-bit when necessary. This also doesn't explain why `atomic_uint16_t` doesn't exist on implementations with `uint16_t`. Maybe they kept it simple just for length, and/or to not create dependencies between optional implementation choices? IDK. – Peter Cordes Jul 07 '23 at 15:37
@PeterCordes And does PPC have atomic int16 types? IIRC PPC cannot handle int16 very well to begin with; the performance of int16 on 32 bit CPUs was as bad as int64 and the worst performance on 64 bit CPUs. Also int8 performance was not great. One consequence of that was that Apple decided to use 32 bit bools in ObjC on their PPC machines, compared to single bytes ones on Intel; which caused a lot of headaches during the platform migration when dealing with bool pointers and devs were "smart enough" to hardcode the type length. – Mecki Jul 07 '23 at 15:45
I'm not sure that's relevant to whether `atomic_uint16_t` can or should exist. It could be a 32-bit type with only 16 value bits, since `_Atomic uint16_t` isn't constrained by the same requirement to have no padding that `uint16_t` is. (Or maybe that is the reason, that people would assume `atomic_uint16_t` did imply the atomic type was also fixed width if it existed.) – Peter Cordes Jul 07 '23 at 16:23
But anyway, yes, GCC for PowerPC implements a lock-free `_Atomic uint16_t`. https://godbolt.org/z/seasbEh47 On CPUs before POWER8, it looks like 8 and 16-bit RMWs like `fetch_add` operate on the 32-bit word containing the half-word, but pure load and pure store are just `sth` (store half-word) and `lhz` (load half-word zero-extend). Fortunately PowerPC has good bitfield extract and insert with `rlwinm`, and atomic RMWs are expected to be somewhat costly. But still, yeah not until POWER8 did the ISA get hardware support for narrow atomic RMWs. – Peter Cordes Jul 07 '23 at 16:28
Single-byte pure stores can also be a bit less efficient on some non-x86 CPUs since most use 32 or 64-bit ECC granules in L1d cache, so it takes an internal RMW cycle to update a byte within a granule. ([Are there any modern CPUs where a cached byte store is actually slower than a word store?](https://stackoverflow.com/q/54217528)) That might be the motivation for 32-bit `bool` in Apple's PowerPC Objc ABI. Maybe also lock-free atomic bool efficiency, both on single-core with threads and on SMP PowerPCs, like some Mac-clone non-Apple workstations. – Peter Cordes Jul 07 '23 at 16:30

score 0 · Answer 3 · answered Jul 12 '20 at 18:05

0

I can only guess, but if you can implement atomic access only to things larger than uint16_t, then implementing atomic access to uint_least16_t and uint_fast16_t can always be done by defining the types accordingly, while atomic access to uint16_t may be just impossible with the available hardware. And you don't want anything in the standard that cannot be implemented.

answered Jul 12 '20 at 18:05

gnasher729

51,477
5
75
98

1

`uint16_t` isn't guaranteed to exist at all. I think that's the issue. `_Atomic uint16_t` [isn't guaranteed to have the same size](https://en.cppreference.com/w/c/language/atomic) as `uint16_t`, so it would certainly be possible for an implementation to have only 32-bit atomic ops, but somehow still have `uint16_t`, and implement `_Atomic uint16_t` by padding to 32 bits. But it's only usable on implementations that provide `uint16_t` at all. So `atomic_uint16_t` could have been provided as an optional type. – Peter Cordes Jul 12 '20 at 21:15
Such a machine would have to be able to still implement plain uint16_t without a 32-bit read/modify/write, because separate threads can write adjacent array elements of a non-atomic `uint16_t arr[]`. So a word-addressable DSP with a word-size greater than 16-bits couldn't do that. You could have a machine with byte and 16-bit pure-load and pure-store, but atomic-RMW capability only for 32-bit. I think that's what you're picturing, but like I said you could handle that by having `_Atomic uint16_t` be 4 bytes. But maybe nobody wants that surprise, so the standards committee ruled it out? – Peter Cordes Jul 12 '20 at 21:19
2

*And you don't want anything in the standard that cannot be implemented*. The authors of the Standard may want to avoid suggesting that some implementations are better than others, but **I'd** much rather have the Standard include things that some implementations can't support, than have no standard way to accomplish things that would only be supportable by 99% of target platforms. – supercat Jul 13 '20 at 14:46
1

@supercat: Fortunately, in this case you can simply write `_Atomic uint16_t` on any C implementation that has `uint16_t`, e.g. as part of `typedef _Atomic uint16_t atomic_u16`. There's no downside to that vs. a hypothetical `atomic_uint16_t` except for a standardized name for the type. The implementation widening it to 32-bit (for example) if necessary is provided by the `_Atomic` qualifier, not the convenience typedefs. But in general, 100% agree on C portably exposing more common-but-not-universal CPU functionality, though (e.g. popcount and bit-scan). – Peter Cordes Jul 14 '20 at 06:08
2

@PeterCordes: Unfortunately, the Standard doesn't allow for the possibility that an implementation might be able to atomically perform some useful operations directly on native types, but not perform a compare-and-swap on them. If the Standard defined optional functions that would perform atomic operations directly on native types, and which segregated out functions that report various levels of detail about what they did, that would allow existing implementations to be efficiently upgraded merely by adding a library, without requiring that the compiler know or care about atomic types. – supercat Jul 14 '20 at 14:55
1

@PeterCordes: Separating out levels of reporting would be important on platforms like original 8086 or many microcontrollers which have a read-modify-write instructions which operate atomically, and set flags based upon what they did, but don't capture the entire old or new value. Even if a platform has compare-and-swap, processing a decrement as e.g. `sub dword [esi], 1 / sbc eax,eax / ret` may be much more efficient than trying to use a compare-and-swap loop if what code cares about is whether the value was decremented past zero [use -1 as the 'idle state' value]. – supercat Jul 14 '20 at 15:02
1

@PeterCordes: Further, on many freestanding implementations, the notion of "lock-based" atomic operations would be fundamentally broken. An atomic library could fairly easily be designed to operate usefully if multiple threads would try to operate on an object simultaneously, or if a main thread and a signal try to do likewise, but it would have to know either that the first operation can complete if the second one blocks, or that the first will be stalled until the second one completes. Freestanding implementations would have no way of knowing which scenario applies. – supercat Jul 14 '20 at 15:08
@supercat "freestanding" = ? – Jason S Jul 15 '20 at 02:49
1

@JasonS: A "hosted implementation" is generally one that is designed to generate code that performs I/O using an operating system the implementation knows about, while a "freestanding implementation" is designed to run in contexts where there may not *be* an operating system that the the implementation knows about. Freestanding implementations are often used to write code for micro-controller based devices like thermostats, appliances, electronic toys, etc. which often have no "operating system" other than the code supplied by the programmer. For example... – supercat Jul 15 '20 at 03:15
...a microcontroller may have a configurable timer which will cause a specified routine to be executed 1,000 times/second (via fashion similar to raising a signal), and the programmer may use that routine to take care of any background tasks that need to be accomplished, even though there's no formal "operating system" as such. – supercat Jul 15 '20 at 03:16
OK, yes, I've been working on freestanding implementations for nearly 25 years, just that I've never heard them called that. ("bare-metal embedded systems" is the buzzword I'm used to) – Jason S Jul 15 '20 at 16:03
@supercat p.s. would you mind sending me a message privately? (via twitter might be easiest; username is in my profile) I have a question. – Jason S Jul 15 '20 at 16:18
@JasonS: The C Standard uses the terms "hosted implementation" and "freestanding implementation". It doesn't actually specify any means by which freestanding implementations can do anything non-trivial, but expects such things to be platform-specific quality-of-implementation issues. I don't have a twitter account, but feel free to continue the discussion in chat. – supercat Jul 15 '20 at 16:20

Why does stdatomic.h contain atomic_uint_least16_t and atomic_uint_fast16_t but not atomic_uint16_t?

3 Answers3