287

What is the advantage of using uint8_t over unsigned char in C?

I know that on almost every system uint8_t is just a typedef for unsigned char, so why use it?

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
Frames Catherine White
  • 27,368
  • 21
  • 87
  • 137

8 Answers8

280

It documents your intent - you will be storing small numbers, rather than a character.

Also it looks nicer if you're using other typedefs such as uint16_t or int32_t.

the swine
  • 10,713
  • 7
  • 58
  • 100
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 12
    Explicitly using `unsigned char` or `signed char` documents the intent too, since unadorned `char` is what shows you're working with characters. – caf Nov 12 '09 at 23:37
  • @caf: If you're lucky enough to get beyond an unadorned 'unsigned' to begin with, which I still see people doing to let the platform pick if its int or char by default. But, I think, in this day and age 'unsigned' (alone, or adorned) indicates the intent adequately, otherwise a simple process of elimination explains it :) – Tim Post Nov 29 '09 at 16:50
  • 10
    I thought an unadorned `unsigned` was `unsigned int` by definition? – Mark Ransom Nov 29 '09 at 19:29
  • 7
    @endolith, using uint8_t for a string isn't necessarily wrong, but it's definitely weird. – Mark Ransom Nov 14 '11 at 13:58
  • So a uint8_t can hold an integer value between 0 - 255 or 8 binary bits. Just like the when the syntax was unsigned char, but much better grammar. – Hellonearthis Dec 27 '12 at 03:52
  • @NickSoft, the question wasn't about those other types so I didn't get into that. And unfortunately `unsigned char` and `uint8_t` aren't distinct types, see for example http://ideone.com/GMV0uD – Mark Ransom Apr 03 '13 at 20:32
  • hmm. I'm sorry for giving wrong information... I knew that enum and int are distrinct types and I assumed that it's the same for other types that can be auto-casted to int. Or maybe it depends on compiller... – NickSoft Apr 04 '13 at 08:39
  • 6
    @endolith, I think I can make a case for uint8_t with UTF8 text. Indeed, `char` seems to imply a character, whereas in the context of a UTF8 string, it may be just one byte of a multibyte character. Using uint8_t could make it clear that one shouldn't expect a character at every position -- in other words that each element of the string/array is an arbitrary integer that one shouldn't make any semantic assumptions about. Of course all C programmers know this, but it may push beginners to ask the right questions. – tne Jan 16 '14 at 11:46
  • 2
    I have to say, `unsigned char` isn't really used to store characters in the first place, so the "intent" issue is moot. – user541686 Jul 22 '14 at 11:00
  • Well, that's historical. I think we can assume that it *was* used to store characters "in the first place" (original intent; that `char` is an abbreviation of character is fairly unambiguous), but indeed doesn't *in practice* because it was *historically* the only standard 8-bit datatype until C99 `inttypes.h` appeared. Now that we have `inttypes.h`, I feel it's in fact all about intent when comparing the original datatypes and the newer `(u)int_(least/fast)N_t` datatypes, and about intent and assurance that the code either compiles with exact width or not at all when it comes to `(u)intN_t`. – tne Nov 27 '17 at 02:01
89

Just to be pedantic, some systems may not have an 8 bit type. According to Wikipedia:

An implementation is required to define exact-width integer types for N = 8, 16, 32, or 64 if and only if it has any type that meets the requirements. It is not required to define them for any other N, even if it supports the appropriate types.

So uint8_t isn't guaranteed to exist, though it will for all platforms where 8 bits = 1 byte. Some embedded platforms may be different, but that's getting very rare. Some systems may define char types to be 16 bits, in which case there probably won't be an 8-bit type of any kind.

Other than that (minor) issue, @Mark Ransom's answer is the best in my opinion. Use the one that most clearly shows what you're using the data for.

Also, I'm assuming you meant uint8_t (the standard typedef from C99 provided in the stdint.h header) rather than uint_8 (not part of any standard).

Community
  • 1
  • 1
Chris Lutz
  • 73,191
  • 16
  • 130
  • 183
  • 1
    DSPs with `CHAR_BIT > 8` are becoming *less* rare, not more. – caf Nov 12 '09 at 23:36
  • 3
    @caf, out of sheer curiosity - can you link to description of some? I know they exist because someone mentioned one (and linked to developer docs for it) in a comp.lang.c++.moderated discussion on whether C/C++ type guarantees are too weak, but I cannot find that thread anymore, and it's always handy to reference that in any similar discussions :) – Pavel Minaev Nov 12 '09 at 23:40
  • 3
    "Some systems may define char types to be 16 bits, in which case there probably won't be an 8-bit type of any kind." - and despite some incorrect objections from me, Pavel has demonstrated in his answer that if char is 16 bits, then even if the compiler does provide an 8 bit type, it *must not* call it `uint8_t` (or typedef it to that). This is because the 8bit type would have unused bits in the storage representation, which `uint8_t` must not have. – Steve Jessop Nov 13 '09 at 03:29
  • 3
    The SHARC architecture has 32-bit words. See http://en.wikipedia.org/wiki/Super_Harvard_Architecture_Single-Chip_Computer for details. – BCran Nov 13 '09 at 16:17
  • 2
    And TI's C5000 DSPs (which were in OMAP1 and OMAP2) are 16bit. I think for OMAP3 they went to C6000-series, with an 8bit char. – Steve Jessop Nov 13 '09 at 17:30
  • Oh yes, it was indeed SHARC. Thanks. Looks like a perfect platform for B (the one between BCPL and C) to me :) – Pavel Minaev Nov 13 '09 at 19:39
  • 4
    Digging into N3242 - "Working Draft, Standard for Programming Language C++", section 18.4.1 < cstdint > synopsis says - `typedef unsigned integer type uint8_t; // optional` So, in essence, a C++ standard conforming library is not needed to define uint8_t at all (see the comment //optional) – nightlytrails Feb 23 '13 at 09:17
  • 2
    In cases where the smallest data type is greater than 8 bits (e.g. Ti's C2000-series they are 16-bits) I believe one could use `uint_least8_t` to properly indicate the intent *and* the fact that the type may not actually be 8-bits. – Toby May 29 '15 at 09:18
63

The whole point is to write implementation-independent code. unsigned char is not guaranteed to be an 8-bit type. uint8_t is (if available).

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 4
    ...if it exists on a system, but that's going to be very rare. +1 – Chris Lutz Nov 12 '09 at 22:57
  • 2
    well if you really had trouble with your code not compiling on a system because uint8_t didn't exist, you could use find and sed to automatically change all occurences of uint8_t to unsigned char or something more useful to you. – bazz Jul 24 '14 at 20:36
  • 2
    @bazz - not if you are assuming it is an 8-bit type you can't - for example to unpack data packaged in bytewise fashion by a remote system. The implicit assumption is that the reason for uint8_t to not exist is on a processor where a char is more than 8 bits. – Chris Stratton Apr 11 '15 at 21:56
  • throw in assertion assert(sizeof(unsigned char) == 8); – bazz Apr 12 '15 at 22:29
  • 4
    @bazz incorrect assertion I'm afraid. `sizeof(unsigned char)` will return `1` for 1 byte. but if a system char and int are the same size of, for e.g., 16-bits then `sizeof(int)` will also return `1` – Toby May 29 '15 at 09:22
  • OK .. anybody have a solution? – bazz Jun 04 '15 at 01:07
  • @bazz `#if CHAR_BIT == 8` or `#ifdef UINT8_MAX` – chux - Reinstate Monica Feb 28 '18 at 12:49
  • @jwd: That's false. `uint8_t` **is** guaranteed to be a precisely 8-bit type. What is not guaranteed is that whether this type is *available*. But if it is available, then it is exactly 8-bit wide. It is true that `char` is not guaranteed to be 8-bit wide, but `uint8_t` has nothing to do with `char`. – AnT stands with Russia Mar 18 '20 at 08:08
  • Oh, right you are; I learned something (: I'll delete my comment in a bit just so nobody is misled by it by accident. – jwd Mar 20 '20 at 23:39
  • much rather have a compiler error explicitly revealing that the unsigned 8-bit integer you were _expecting_ doesn't **exist**, than have your code choke and die later on... (unless, of course, your code doesn't rely on said chars being 8 bits, in which case _of course_ feel free to call them chars!) – JamesTheAwesomeDude Jun 11 '20 at 21:41
  • @JamesTheAwesomeDude - This is exactly it. It's perfectly fine to demand things of the platform, this is what uint8_t is explicitly for. If it can't provide a uint8, I want the brakes thrown immediately. – Anne Quinn Aug 31 '21 at 12:08
10

As you said, "almost every system".

char is probably one of the less likely to change, but once you start using uint16_t and friends, using uint8_t blends better, and may even be part of a coding standard.

dchest
  • 1,525
  • 18
  • 20
Justin Love
  • 4,397
  • 25
  • 36
7

There's little. From portability viewpoint, char cannot be smaller than 8 bits, and nothing can be smaller than char, so if a given C implementation has an unsigned 8-bit integer type, it's going to be char. Alternatively, it may not have one at all, at which point any typedef tricks are moot.

It could be used to better document your code in a sense that it's clear that you require 8-bit bytes there and nothing else. But in practice it's a reasonable expectation virtually anywhere already (there are DSP platforms on which it's not true, but chances of your code running there is slim, and you could just as well error out using a static assert at the top of your program on such a platform).

Pavel Minaev
  • 99,783
  • 25
  • 219
  • 289
  • 1
    For the record, you could make an 8-bit type on any platform: `typedef struct { unsigned i :8; } uint8_t;` but you'd have to use it as `uint8_t x; x.i = ...` so it'd be a bit more cumbersome. – Chris Lutz Nov 12 '09 at 22:45
  • I think chars can go as low as 4 bits, below that and things fall apart a bit in the standard (there is a chance I'm wrong though). – Skizz Nov 12 '09 at 22:48
  • 9
    @Skizz - No, the standard requires `unsigned char` to be able to hold values between 0 and 255. If you can do that in 4 bits, my hat is off to you. – Chris Lutz Nov 12 '09 at 22:50
  • 2
    "it'd be a bit more cumbersome" - cumbersome in the sense that you'd have to walk (swim, catch a plane, etc) all the way over to where the compiler writer was, slap them in the back of the head, and make them add `uint8_t` to the implementation. I wonder, do compilers for DSPs with 16bit chars typically implement `uint8_t`, or not? – Steve Jessop Nov 12 '09 at 23:06
  • @Steve, no, they don't, since there really isn't any way for them to do that. Bitfield trick does indeed work, but bitfields are very limited (you can't have arrays of them, you can't have pointers to them, etc). There's no requirement in C99 for a standard to have `uint8_t` at all - it must have it if and only if it has a corresponding type. It is, however, required to provide `uint8_least_t`, which is _at least_ 8 bits (but can be larger). – Pavel Minaev Nov 12 '09 at 23:19
  • 6
    By the way, on a second thought, it is perhaps the most straightforward way to say "I really need 8 bits" - `#include `, and use `uint8_t`. If the platform has it, it will give it to you. If the platform doesn't have it, your program will not compile, and the reason will be clear and straightforward. – Pavel Minaev Nov 12 '09 at 23:23
  • I like the logic that if `uint8_t` exists at all, it's going to be `unsigned char` anyway. – caf Nov 12 '09 at 23:41
  • "there really isn't any way for them to do that" - well, it depends how the compiler is coded. You know they're able to generate the code to do 8bit unsigned arithmetic, because of bitfields (probably normal arithmetic, plus some masking). Of course you'd have `sizeof(uint8_t) == sizeof(char)` even though `UCHAR_MAX != 255`, but that's OK, it's why types don't have to use all their storage bits. By "slap in the back of the head" I of course mean "make an impassioned but polite feature request". They're entitled to turn it down, but how confident are they that you won't resort to violence? ;-) – Steve Jessop Nov 12 '09 at 23:43
  • As for "straightforward" - it's certainly the least up-front coding effort, but as you say, for true portability you just have to use `uint8_least_t` and apply the modulo-256 overflow for yourself. I'm guessing you can write it so that on any vaguely optimising compiler where `uint8_least_t` is 8 bits, all the extra ops are elided. – Steve Jessop Nov 12 '09 at 23:50
  • "Of course you'd have sizeof(uint8_t) == sizeof(char) even though UCHAR_MAX != 255, but that's OK, it's why types don't have to use all their storage bits." - it's not okay because `unsigned char` is specifically required to use all storage bits fully by both ISO C and C++. See 6.2.6.1/3 (and the corresponding footnote) for C99, and 3.9.1/1 for C++03. – Pavel Minaev Nov 12 '09 at 23:53
  • It is OK. `unsigned char` (which in this example is 16bit) uses all bits, but AFAIK `uint8_t` doesn't have to. Hence `uint8_t` can be smaller than `unsigned char` in range, although obviously not in storage size. So I don't see why it should be difficult for the compiler writer to support `uint8_t`. It might be monstrously inefficient, but that's a separate issue. – Steve Jessop Nov 13 '09 at 01:16
  • 2
    Still no cigar, sorry: "For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits ... If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^(N-1), so that objects of that type shall be capable of representing values from 0 to 2^(N-1) using a pure binary representation ... The typedef name intN_t designates a signed integer type with width N, __no padding bits__, and a two’s complement representation." – Pavel Minaev Nov 13 '09 at 02:38
  • OK, you win :-). 7.18.1.1 conspicuously doesn't say that the unsigned versions have no padding bits. But it's implied by the requirement that if you provide uint8_t then you must provide int8_t, and the lemma: if uint8_t has padding bits, then int8_t has padding bits, since they're the same width and the same storage size. – Steve Jessop Nov 13 '09 at 03:16
  • Moral of the story: integer types are stupid, albeit fast. If you need arithmetic modulo any particular power of two, either write it yourself or use a POSIX-compliant implementation, where uint8_t is compulsory ;-) – Steve Jessop Nov 13 '09 at 03:21
  • 1
    If you just need arithmetic modulo, unsigned bitfield will do just fine (if inconvenient). It's when you need, say, an array of octets with no padding, that's when you're SOL. Moral of the story is not to code for DSPs, and stick to proper, honest-to-God 8-bit char architectures :) – Pavel Minaev Nov 13 '09 at 06:10
  • Unfortunately, while the Standard would require that if `uint8_t` exists, then `unsigned char` must also be 8 bits, it would not forbid an implementation from making `uint8_t` an 8-bit extended integer type. It would be genuinely useful to have an 8-bit unsigned type which doesn't receive the special aliasing treatment given to `unsigned char`, and nothing would forbid an implementation from making `uint8_t` be such a type [IMHO, the proper way to define such a type would be to give it a special name which could be aliased to `uint8_t` on implementations that support the latter... – supercat Aug 15 '16 at 18:15
  • ...but don't have a non-aliasing 8-bit type]. – supercat Aug 15 '16 at 18:15
7

In my experience there are two places where we want to use uint8_t to mean 8 bits (and uint16_t, etc) and where we can have fields smaller than 8 bits. Both places are where space matters and we often need to look at a raw dump of the data when debugging and need to be able to quickly determine what it represents.

The first is in RF protocols, especially in narrow-band systems. In this environment we may need to pack as much information as we can into a single message. The second is in flash storage where we may have very limited space (such as in embedded systems). In both cases we can use a packed data structure in which the compiler will take care of the packing and unpacking for us:

#pragma pack(1)
typedef struct {
  uint8_t    flag1:1;
  uint8_t    flag2:1;
  padding1   reserved:6;  /* not necessary but makes this struct more readable */
  uint32_t   sequence_no;
  uint8_t    data[8];
  uint32_t   crc32;
} s_mypacket __attribute__((packed));
#pragma pack()

Which method you use depends on your compiler. You may also need to support several different compilers with the same header files. This happens in embedded systems where devices and servers can be completely different - for example you may have an ARM device that communicates with an x86 Linux server.

There are a few caveats with using packed structures. The biggest gotcha is that you must avoid dereferencing the address of a member. On systems with mutibyte aligned words, this can result in a misaligned exception - and a coredump.

Some folks will also worry about performance and argue that using these packed structures will slow down your system. It is true that, behind the scenes, the compiler adds code to access the unaligned data members. You can see that by looking at the assembly code in your IDE.

But since packed structures are most useful for communication and data storage then the data can be extracted into a non-packed representation when working with it in memory. Normally we do not need to be working with the entire data packet in memory anyway.

Here is some relevant discussion:

pragma pack(1) nor __attribute__ ((aligned (1))) works

Is gcc's __attribute__((packed)) / #pragma pack unsafe?

http://solidsmoke.blogspot.ca/2010/07/woes-of-structure-packing-pragma-pack.html

Community
  • 1
  • 1
Tereus Scott
  • 674
  • 1
  • 6
  • 11
4

That is really important for example when you are writing a network analyzer. packet headers are defined by the protocol specification, not by the way a particular platform's C compiler works.

VP.
  • 5,122
  • 6
  • 46
  • 71
2

On almost every system I've met uint8_t == unsigned char, but this is not guaranteed by the C standard. If you are trying to write portable code and it matters exactly what size the memory is, use uint8_t. Otherwise use unsigned char.

atlpeg
  • 137
  • 5
  • 4
    `uint8_t` _always_ matches range and size of `unsigned char` and padding (none) when `unsigned char` is 8-bit. When `unsigned char` is not 8-bit, `uint8_t` does not exist. – chux - Reinstate Monica Dec 03 '16 at 23:48
  • @chux, Do you have a reference to the exact place in the standard where it says that? If `unsigned char` is 8-bit, is `uint8_t` guaranteed to be a `typedef` thereof and not a `typedef` of an _extended unsigned integer type_? – hsivonen Feb 28 '18 at 07:56
  • @hsivonen "exact place in the standard where it says that?" --> No - yet look to 7.20.1.1. It is readily deduced as `unsigned char/signed char/char` are the smallest type - no smaller than 8 bits. `unsigned char` has no padding. For `uint8_t` to be, it must be 8-bits, no padding, exist because of an implementation provided integer type: matching the minimal requirements of `unsigned char`. As to "... guaranteed to be a typedef..." looks like a good question to post. – chux - Reinstate Monica Feb 28 '18 at 12:42