Can uint8_t be a non-character type?

Question

In this answer and the attached comments, Pavel Minaev makes the following argument that, in C, the only types to which uint8_t can be typedef'd are char and unsigned char. I'm looking at this draft of the C standard.

The presence of uint8_t implies the presence of a corresponding type int8_t (7.18.1p1).
int8_t is 8 bits wide and has no padding bits (7.18.1.1p1).
Corresponding types have the same width (6.2.5p6), so uint8_t is also 8 bits wide.
unsigned char is CHAR_BIT bits wide (5.2.4.2.1p2 and 6.2.6.1p3).
CHAR_BIT is at least 8 (5.2.4.2.1p1).
CHAR_BIT is at most 8, because either uint8_t is unsigned char, or it's a non-unsigned char, non-bit-field type whose width is a multiple of CHAR_BIT (6.2.6.1p4).

Based on this argument, I agree that, if uint8_t exists, then both it and unsigned char have identical representations: 8 value bits and 0 padding bits. That doesn't seem to force them to be the same type (e.g., 6.2.5p14).

Is it allowed that uint8_t is typedef'd to an extended unsigned integer type (6.2.5p6) with the same representation as unsigned char? Certainly it must be typedef'd (7.18.1.1p2), and it cannot be any standard unsigned integer type other than unsigned char (or char if it happens to be unsigned). This hypothetical extended type would not be a character type (6.2.5p15) and thus would not qualify for aliased access to an object of an incompatible type (6.5p7), which strikes me as the reason a compiler writer would want to do such a thing.

Not an answer to your explicit question, but it's possible that `char` is an unsigned type, then `uint8_t` could be typedef'ed to `char` instead of `unsigned char`. — Daniel Fischer, Sep 30 '12 at 23:57
At one point (several years ago) there was serious discussion within the GCC project of adding an extended integer type with exactly the properties you are describing---eight bits wide, not a character type, and not being a special case in type-based alias analysis---but as far as I know it never went anywhere, and there was no suggestion of its being the underlying type for `uint8_t` (possibly just because nobody thought of it at the time). — zwol, Oct 01 '12 at 00:11
Note that `uint8_t` doesn't need to exist at all. It's only conditionally defined. — Kerrek SB, Oct 01 '12 at 00:14
@KerrekSB Yep. Implementations that don't use two's-complement seem like a purely theoretical possibility going forward, and I don't really know how implementations with `CHAR_BIT != 8` would handle binary I/O, so I choose not to support them. Everything else will have `uint8_t`. — user1710520, Oct 01 '12 at 00:33
By rights you're now the muggins at your place of work, who has to find all the code in which `uint8_t` is used where it should have been `unsigned char` and which therefore violates strict aliasing on any such implementation. In compensation, you get to be very smug about "I *told* you we shouldn't just write our code assuming `CHAR_BIT == 8`, if you hadn't assumed that you'd never have used `uint8_t` for a byte in the first place, and you wouldn't have made this mistake". — Steve Jessop, Oct 01 '12 at 00:42
@SteveJessop: +1 the point about aliasing violations by virtue of not being a character type is something I'd missed. With that in mind, using `uint8_t` could give much better performance on such implementations (since the compiler can assume it never aliases). — R.. GitHub STOP HELPING ICE, Oct 01 '12 at 01:09
@R..: EEEK! Stuff like that is why I despise C's attitude toward Undefined Behavior. "Hey, if we say that people's production code is broken and don't have to make it continue to work, we can make it run faster!" — supercat, Aug 26 '14 at 13:13

score 5 · Answer 1 · answered Oct 01 '12 at 00:28

5

If uint8_t exists, the no-padding requirement implies that CHAR_BIT is 8. However, there's no fundamental reason I can find why uint8_t could not be defined with an extended integer type. Moreover there is no guarantee that the representations are the same; for example, the bits could be interpreted in the opposite order.

While this seems silly and gratuitously unusual for uint8_t, it could make a lot of sense for int8_t. If a machine natively uses ones complement or sign/magnitude, then signed char is not suitable for int8_t. However, it could use an extended signed integer type that emulates twos complement to provide int8_t.

answered Oct 01 '12 at 00:28

R.. GitHub STOP HELPING ICE

208,859
35
376
711

"for example, the bits could be interpreted in the opposite order". Does that imply that the bits can be interpreted in the opposite order in `unsigned char` vs (each individual byte of) `unsigned int`? Or is it only extended integer types that can cross-wire the data bus ;-) – Steve Jessop Oct 01 '12 at 00:45
@SteveJessop: `unsigned int` is typically N bytes (where N is typically 4) interpreted in either little endian or big endian order, but the standard doesn't say anything about the order of the bytes or even the bits. I'm not sure if your "cross-wire the data bus" question was serious, but there's no reason that `*(unsigned char*)&(uint8_t){1}` has to be 1 and not some other power of two. (The "pure binary" requirement does mean it's *some* power of two, however.) – R.. GitHub STOP HELPING ICE Oct 01 '12 at 01:06
1

I didn't seriously mean that it would be achieved by physically cross-wiring the bus, but other than that the question was serious. It's just I don't think I've ever considered before that `unsigned int a = 1; for (unsigned char *p = (unsigned char*)&a; p < (unsigned char*)(&a+1); ++p) if (*p == 1) return;` need not return. Likewise, I guess something silly like `unsigned int a = 0xF; strlen((char*)&a);` is a potential buffer overrun provided `sizeof(a) >= 4` (4 being the number of bits set in the value), unless the standard says that the bytes of the object repr. are *contiguous* value bits. – Steve Jessop Oct 01 '12 at 01:24

score 4 · Answer 2 · answered Oct 01 '12 at 00:28

In 6.3.1.1 (1) (of the N1570 draft of the C11 standard), we can read

The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width.

So the standard explicitly allows the presence of extended integer types of the same width as a standard integer type.

There is nothing in the standard prohibiting a

typedef implementation_defined_extended_8_bit-unsigned_integer_type uint8_t;

if that extended integer type matches the specifications for uint8_t (no padding bits, width of 8 bits), as far as I can see.

So yes, if the implementation provides such an extended integer type, uint8_t may be typedef'ed to that.

score 0 · Answer 3 · answered Jan 25 '15 at 05:30

0

uint8_t may exist and be a distinct type from unsigned char.

One significant implication of this is in overload resolution; it is platform-dependent whether:

uint8_t by = 0;
std::cout << by;

uses

operator<<(ostream, char)
operator<<(ostream, unsigned char) or
operator<<(ostream, int)

answered Jan 25 '15 at 05:30

Ben Voigt

277,958
43
419
720

This is a C question. You'd have to provide C++ references for all of the stuff that OP quoted in order to back up your claim. – M.M Jul 28 '15 at 06:45
Did you mean to say "`uint8_t` may exist and be a distinct type from both `char` and `unsigned char`" ? or are you only claiming that it may be distinct from `unsigned char` because of the case where it is `char`? Also,why am I getting deja vu writing this – M.M Jul 28 '15 at 06:46

Aniket Inge · Answer 4 · 2012-10-01T00:37:07.293

-1

int8_t and uint8_t differ only by REPRESENTATION and NOT the content(bits). int8_t uses lower 7 bits for data and the 8th bit is to represent "sign"(positive or negative). Hence the range of int8_t is from -128 to +127 (0 is considered a positive value).

uint8_t is also 8 bits wide, BUT the data contained in it is ALWAYS positive. Hence the range of uint8_t is from 0 to 255.

Considering this fact, char is 8 bits wide. unsigned char would also be 8 bits wide but without the "sign". Similarly short and unsigned short are both 16 bits wide.

IF however, "unsigned int" be 8 bits wide, then .. since C isn't too type-nazi, it IS allowed. And why would a compiler writer allow such a thing? READABILITY!

edited Oct 01 '12 at 00:37

answered Oct 01 '12 at 00:07

Aniket Inge

25,375
5
50
78

1

The width of `unsigned int` is no less than 16. – user1710520 Oct 01 '12 at 00:08
according to the specifications YES. But we are seeing this hypothetically aren't we? – Aniket Inge Oct 01 '12 at 00:09
Yes, I say "hypothetically" because I am not aware of an existing implementation that has a non-character `uint8_t`. – user1710520 Oct 01 '12 at 00:11
but its 8 bits long :) As i said you _can_ do it this way as well – Aniket Inge Oct 01 '12 at 00:27
obviously now you can't assign values to it directly and access each bit – Aniket Inge Oct 01 '12 at 00:28
No, because the standard forbids it. 7.18.1.1: *The typedef name uintN_t designates an unsigned integer type with width N . Thus, uint24_t denotes an unsigned integer type with a width of exactly 24 bits.* – R.. GitHub STOP HELPING ICE Oct 01 '12 at 00:31

Can uint8_t be a non-character type?

4 Answers4

Linked