When is uint8_t ≠ unsigned char?

Question

According to C and C++, CHAR_BIT >= 8.
But whenever CHAR_BIT > 8, uint8_t can't even be represented as 8 bits.
It must be larger, because CHAR_BIT is the minimum number of bits for any data type on the system.

On what kind of a system can uint8_t be legally defined to be a type other than unsigned char?

(If the answer is different for C and C++ then I'd like to know both.)

I wonder if it's legal to have a `char` with only 7 real bits and 1 padding bit. — Mysticial, Apr 22 '13 at 01:48
@Mysticial: Nope, I think all `char`s must have all of their representation bits participate in determining their value. — user541686, Apr 22 '13 at 01:49
Or maybe a 16-bit `uint8_t` where 8 is real and 8 is padding. I'd shoot whoever made such an environment though. :) — Mysticial, Apr 22 '13 at 01:50
The C++ standard lists it as optional. `typedef signed integer type int8_t; // optional` — Rapptz, Apr 22 '13 at 01:50
@Mysticial: Not sure that's allowed either, because the width is supposed to be exactly 8 bits. :P — user541686, Apr 22 '13 at 01:55
@Mysticial: [u]int*_t is required by the standard to have no padding bits and to be twos-complement if signed. — R.. GitHub STOP HELPING ICE, Apr 22 '13 at 02:13
@Mysticial: Wouldn't you lose range if that happened? Meaning, you need all 8 bits to represent some `char`. — , Apr 22 '13 at 12:54
@Mysticial: Such environments do exist (it's common for DSP architectures to be unable to address anything smaller than a word); in that case, `uint8_t` shouldn't exist at all. — Mike Seymour, Apr 24 '13 at 22:44
Possible duplicate of [uint8\_t vs unsigned char](http://stackoverflow.com/questions/1725855/uint8-t-vs-unsigned-char) — Ciro Santilli OurBigBook.com, Nov 13 '16 at 09:58

R.. GitHub STOP HELPING ICE · Accepted Answer · 2013-04-22T12:51:22.707

63

If it exists, uint8_t must always have the same width as unsigned char. However, it need not be the same type; it may be a distinct extended integer type. It also need not have the same representation as unsigned char; for instance, the bits could be interpreted in the opposite order. This is a silly example, but it makes more sense for int8_t, where signed char might be ones complement or sign-magnitude while int8_t is required to be twos complement.

One further "advantage" of using a non-char extended integer type for uint8_t even on "normal" systems is C's aliasing rules. Character types are allowed to alias anything, which prevents the compiler from heavily optimizing functions that use both character pointers and pointers to other types, unless the restrict keyword has been applied well. However, even if uint8_t has the exact same size and representation as unsigned char, if the implementation made it a distinct, non-character type, the aliasing rules would not apply to it, and the compiler could assume that objects of types uint8_t and int, for example, can never alias.

edited Apr 22 '13 at 12:51

answered Apr 22 '13 at 02:17

R.. GitHub STOP HELPING ICE

208,859
35
376
711

If I'm to believe the snippet of draft standard posted in another answer, `uint8_t` must be defined as a typedef. – Mark Ransom Apr 22 '13 at 02:34
12

`typedef __uint8_t uint8_t;` is a typedef. – R.. GitHub STOP HELPING ICE Apr 22 '13 at 02:36
3

In the interest of humour, perhaps an implementation might decide to be consistent with it's naming conventions and, in contrast to `long long`, it might introduce a `short short`. Hence, `typedef short short int8_t;`... – autistic Apr 22 '13 at 02:42
26

In 2003 ± 2 (not going to go dig it up in the mail archives right now), the GCC team contemplated making `[u]int8_t` special extended integer types exactly so that it could be optimized more aggressively ... but eventually rejected the notion on the grounds that programmers are very likely to expect them to have the same special aliasing properties as `char`. (This was around the same time we were getting screamed at by the kernel people for doing type-based alias analysis *at all*, so we were all a little skittish.) – zwol Apr 24 '13 at 22:54
3

@Zack: Thanks for the interesting historical note. It would be nice if gcc still provided those types, but didn't use them by default, so that a feature test macro or similar could switch to them, enabling the more aggressive optimization. – R.. GitHub STOP HELPING ICE Apr 24 '13 at 23:26
1

@Zack interesting, well this issue popped up in a [question today](http://stackoverflow.com/a/26299253/1708801) and I don't see a portable workaround, which is unfortunate. +1 btw. – Shafik Yaghmour Oct 10 '14 at 20:35
1

@ShafikYaghmour: Nice question. The trivial workaround, however, is to use the `restrict` keyword, or to copy the pointer to a local variable whose address is never taken so that the compiler does not need to worry about whether the `uint8_t` objects can alias it. – R.. GitHub STOP HELPING ICE Oct 10 '14 at 21:57
@R.. thank you for the suggestion, the OP posted a follow-up question and [stated that __restrict__ did not work in gcc for them](http://stackoverflow.com/questions/26297571/how-to-create-an-uint8-t-array-that-does-not-undermine-strict-aliasing#comment41264766_26297914) but the other suggestion did. – Shafik Yaghmour Oct 11 '14 at 01:38
3

Divorcing `uint8_t` from character types was actually discussed at the GCC bugzilla: see . – user3840170 Jan 04 '21 at 20:35

autistic · Answer 2 · 2015-12-26T04:14:10.447

33

On what kind of a system can uint8_t be legally defined to be a type other than unsigned char?

In summary, uint8_t can only be legally defined on systems where CHAR_BIT is 8. It's an addressable unit with exactly 8 value bits and no padding bits.

In detail, CHAR_BIT defines the width of the smallest addressable units, and uint8_t can't have padding bits; it can only exist when the smallest addressable unit is exactly 8 bits wide. Providing CHAR_BIT is 8, uint8_t can be defined by a type definition for any 8-bit unsigned integer type that has no padding bits.

Here's what the C11 standard draft (n1570.pdf) says:

5.2.4.2.1 Sizes of integer types 1 The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. ... Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
-- number of bits for smallest object that is not a bit-field (byte)
   CHAR_BIT                                            8

Thus the smallest objects must contain exactly CHAR_BIT bits.

6.5.3.4 The sizeof and _Alignof operators

...

4 When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. ...

Thus, those are (some of) the smallest addressable units. Obviously int8_t and uint8_t may also be considered smallest addressable units, providing they exist.

7.20.1.1 Exact-width integer types

1 The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

2 The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

3 These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.

The emphasis on "These types are optional" is mine. I hope this was helpful :)

edited Dec 26 '15 at 04:14

answered Apr 22 '13 at 01:54

autistic

1
3
35
80

So what's the purpose of `uint8_t` if it's never different from `unsigned char`? – user541686 Apr 22 '13 at 01:55
4

@Mehrdad I guess in the case of when you actually need a `int8` it would not compile at all when `CHAR_BIT > 8` since `int8_t` wouldn't even exist. Whereas if used `char` and `CHAR_BIT > 8`, then you might get a half-broken build. – Mysticial Apr 22 '13 at 01:56
@Mysticial: Weird, couldn't you already just say `#if CHAR_BIT > 8... #error ZOMG... #endif` if your program is supposed to not work for those systems? – user541686 Apr 22 '13 at 01:57
7

It is different from `unsigned char`. `unsigned char` is guaranteed to exist, but is only guaranteed to be 8 bits when `CHAR_BIT == 8`. `uint8_t` isn't guaranteed to exist, but is guaranteed to be 8 bits when it does. – autistic Apr 22 '13 at 01:58
15

There's a subtle difference between `char` and `int8_t`, besides the width. A `char` might use ones' complement, two's complement or sign-and-magnitude representation, where a `int8_t` is required to use a two's complement representation. – autistic Apr 22 '13 at 02:02
6

I always thought the point of all the specific-size types was so that if something weird was going on, things either kept working or broke right away and told you so. They're also far more readable, when you're not working with `char`s. – ssube Apr 22 '13 at 02:03
It would also be good to say if `char` is guaranteed to have `CHAR_BIT` bits and quote the standard for that. – Ciro Santilli OurBigBook.com Nov 19 '18 at 12:27
Hey, if you decide to delete you answers, do ping me so I can repost them and get rep haha But I have enough rep for job market now, I'm just doing this to save the world. What I really want now is to make money. – Ciro Santilli OurBigBook.com Nov 19 '18 at 20:37
@autistic I thought it was not possible to retract the CC BY-SA of one's answers. But you should chill, I'm just joking. – Ciro Santilli OurBigBook.com Nov 20 '18 at 00:45
OK, I see what you mean. – Ciro Santilli OurBigBook.com Nov 21 '18 at 09:09
@autistic `char` can be `unsigned`, `int8_t` is signed. – 12431234123412341234123 Dec 24 '20 at 16:44
@12431234123412341234123 true. I can't edit that in to the comment, but let it be known that I meant to mention the possibility of `char` being an unsigned type. – autistic May 05 '21 at 23:46

zwol · Answer 3 · 2013-04-24T22:41:38.763

8

A possibility that no one has so far mentioned: if CHAR_BIT==8 and unqualified char is unsigned, which it is in some ABIs, then uint8_t could be a typedef for char instead of unsigned char. This matters at least insofar as it affects overload choice (and its evil twin, name mangling), i.e. if you were to have both foo(char) and foo(unsigned char) in scope, calling foo with an argument of type uint8_t would prefer foo(char) on such a system.

edited Apr 24 '13 at 22:41

answered Apr 24 '13 at 22:30

zwol

135,547
38
252
361

1

"However, it need not be the same type; it may be a distinct extended integer type." covers that in part, although it's true it might easily be overlooked. – Luc Danton Apr 24 '13 at 22:35
2

@LucDanton `char` is not an *extended* integer type. – zwol Apr 24 '13 at 22:40
2

"it need not be the same type" is the relevant part. I took the rest to be an example. – Luc Danton Apr 24 '13 at 22:54

When is uint8_t ≠ unsigned char?

3 Answers3

Linked

Related