33

I know that the C/C++ standards only guarantee a minimum of 8 bits per char, and that theoretically 9/16/42/anything else is possible, and that therefore all sites about writing portable code warn against assuming 8bpc. My question is how "non-portable" is this really?

Let me explain. As I see it, there a 3 categories of systems:

  1. Computers - I mean desktops, laptops, servers, etc. running Mac/Linux/Windows/Unix/*nix/posix/whatever (I know that list isn't strictly correct, but you get the idea). I would be very surprised to hear of any such system where char is not exactly 8 bits. (please correct me if I am wrong)
  2. Devices with operating systems - This includes smartphones and such embedded systems. While I will not be very surprised to find such a system where char is more tham 8 bits, I have not heard of one to date (again, please inform me if I am just unaware)
  3. Bare metal devices - VCRs, microwave ovens, old cell phones, etc. In this field I haven't the slightest experience, so anything can happen here. However, do I really need my code to be cross platform between my Windows desktop and my microwave oven? Am I likely to ever have code common to both?

Bottom line: Are there common (more than %0.001) platforms (in categories 1&2 above) where char is not 8 bits? And is my above surmise true?

Baruch
  • 20,590
  • 28
  • 126
  • 201
  • why not just use "sizeof(char)" where it matters and move on with your life? – tbert Jul 22 '12 at 12:44
  • 10
    @tbert `sizeof(char)` is always 1. It is not the size in *bits*, but rather in *chars* – Baruch Jul 22 '12 at 12:45
  • 2
    no, it's the size of the type in *bytes*, from whence you can derive the number of bits. – tbert Jul 22 '12 at 12:46
  • 1
    @tbert sizof(char) will always return 1. sizeof is based on the size of the char. So if sizeof(int) == 4, then your int is 4 times the size of the char (whatever size that is). – Josh Petitt Jul 22 '12 at 12:47
  • 2
    @tbert yes, it's size in bytes, but **a byte is not always 8 bits**. It's not the size in **octets**, which you would have meant IMO. –  Jul 22 '12 at 12:48
  • @H2CO3 at what point did I say "a byte is always 8 bits"? I said "You can derive the number of bits from the number of bytes". Is everybody on this site illiterate? – tbert Jul 22 '12 at 12:49
  • 1
    @tbert apparently, no, except you. –  Jul 22 '12 at 12:49
  • 2
    @tbert, if sizeof(char) always returns 1, how does that help the OP? – Josh Petitt Jul 22 '12 at 12:50
  • 4
    POSIX requires char=8bits. OTOH, some widely used DSPs have 16 or 32-bit chars, e.g. some TI ones used on many ARM platforms. Your smartphone may have one. – ninjalj Jul 22 '12 at 12:53
  • http://stackoverflow.com/questions/6149740/size-of-byte-clarification – loler Jul 22 '12 at 12:53
  • 1
    @tbert: in ISO C a char=byte, and is at least 8 bits, but can be more. POSIX requires it to be 8 bits. RFCs use the term octet to avoid confusion. – ninjalj Jul 22 '12 at 12:55
  • 1
    See also: http://stackoverflow.com/questions/2098149/what-platforms-have-something-other-than-8-bit-char – ninjalj Jul 22 '12 at 12:57
  • 2
    Your comments about bare metal devices seem to be based around the particular coding you plan to do. You are asking us to validate your assumptions without telling us what they're based on. For example, if you write low-level computation libraries or data transport code, it's quite possible it may run on future bare metal devices. If you write GUI programs, maybe not. – David Schwartz Jul 22 '12 at 13:08
  • There are many old systems which [word size is not a power of 2](https://en.wikipedia.org/wiki/Word_%28computer_architecture%29#Table_of_word_sizes), and so is char size. 9, 12, 18, 36, 60 bit and even more odd char size systems are also available. http://stackoverflow.com/questions/5516044/system-where-1-byte-8-bit – phuclv Oct 10 '13 at 05:09
  • 2
    I think this was the question that inspired me to spend months investigating and designing a ternary-based fork of C++, a [ternary asm](https://docs.google.com/spreadsheets/d/1naRck7KxdtjOa0DwQOyPgNuLgH_DGQ8y5rjhWQYFhgg/edit#gid=2017892698), and a ternary CPU. https://xkcd.com/356/ – Mooing Duck Jan 25 '23 at 17:32

5 Answers5

29

use limits.h

CHAR_BIT

http://www.cplusplus.com/reference/clibrary/climits/

also, when you want to use exactly a given size, use stdint.h

Josh Petitt
  • 9,371
  • 12
  • 56
  • 104
  • 22
    I'd recommend this :-) #if (CHAR_BIT != 8) #error You are weird, go away! #endif – Josh Petitt Jul 22 '12 at 12:58
  • 2
    I want to know if I can write code assuming `char` is 8 bits, not how to find the number of bits in a `char` – Baruch Jul 22 '12 at 12:59
  • 1
    @baruch, maybe. Do you care? If you want to pack in 32 bits to an unsigned int and you are doing bit-twiddling, or using memcpy, memset, then yes you probably care. So in that case, use stdint.h types. If you are passing values to functions, or doing other stuff where you just want the native int type (or unsigned) used, then you probably don't care. Anytime you do actually care alot, then I would put a preprocessor guard somewhere that either warns the user they are entering no-mans land, or resolves the problem by providing two different implementations. – Josh Petitt Jul 22 '12 at 13:03
  • @baruch, serialization is also an area where you have to be careful. – Josh Petitt Jul 22 '12 at 13:13
  • @baruch, for these problems, lean on your compiler vendor and their standard implementation as much as possible. They've done most of the hard part for you. Also, if you do care about the number of bits in a byte, then I don't think it is possible to write 100% portable code. In that case you will probably want to write two implementations to take care of any differences between the two. This will be easier, faster, better than trying to write some convoluted mess that only half of the code will ever run on a given platform. – Josh Petitt Jul 22 '12 at 13:16
  • @baruch: *I want to know if I can write code assuming char is 8 bits*, how would that assumption realize in code? In most cases the size of `char` is not assumed, but silently *ignored* – David Rodríguez - dribeas Jul 22 '12 at 17:14
  • @DavidRodríguez-dribeas for example, can I use a char to index into a ring buffer 256 slots big? Will it wrap around from 255 to 0? Or can I left-shift it 7 bits to get 128 or 0, depending on the LSB? Just a few examples. – Baruch Jul 22 '12 at 18:43
  • If CHAR_BIT is not 8, it is quite likely that uint8_t will not be provided. (The compiler has to add an and mask with 0x0FF for every operation, so it's slow). – Malcolm McLean Feb 24 '17 at 17:34
6

For example, many DSP have CHAR_BIT greater than or equal to 16.

md5
  • 23,373
  • 3
  • 44
  • 93
4

At least, similar to the integer size in 64bit architectures, future platforms may use a wider char, with more bits. ASCII characters might become obsolete, replaced by unicode. This might be a reason so be cautious.

avakar
  • 32,009
  • 9
  • 68
  • 103
perreal
  • 94,503
  • 21
  • 155
  • 181
  • 1
    This is actually a counter example. In order to not break all the code relying on int being 32 bits, I think all common compilers leave int as 32 bits even on 64 bit systems. – Baruch Jul 22 '12 at 12:55
  • 1
    @baruch, I agree they do so currently, however, who knows for how long. – perreal Jul 22 '12 at 12:58
0

You can normally safely assume that files will have 8 bit bytes, or if not, that 8 bit byte files can be converted to a zero padded native format by a commonly-used tool. But it is much more dangerous to assume that CHAR_BIT == 8. Currently that is almost always the case, but it might not always be the case in future. 8 bit access to memory is increasingly a bottleneck.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
  • 1
    If we all assume CHAR_BIT is equal to 8, then future processors will never be able to gain a foothold in the market because when we compile our programs to these processors, our programs will not work. Thus, CHAR_BIT will always be equal to 8. Haha? (actually, this makes me really depressed) – Jack G Dec 09 '18 at 19:19
0

The Posix standards require CHAR_BIT to be 8.

So, if you only care about your code running on Posix compliant platforms, then assuming CHAR_BIT == 8 is fine and good.

The vast majority of commodity PC platforms and build systems comply with this requirement. Most any platform that uses the BSD socket interface likely implicitly has this requirement because the assumption that a platform byte is an octet is extremely widely distributed.

#if CHAR_BIT != 8
#error Your platform is unsupported!
#endif

Why did POSIX mandate CHAR_BIT==8?

You should only worry about this assumption / constraint if you want your code to run today on embedded and esoteric platforms. Otherwise, it's a pretty safe assumption in my view.

jschultz410
  • 2,849
  • 14
  • 22