2

I was going through http://www.parashift.com/c++-faq/index.html and there I found that byte can also be 64 bits http://www.parashift.com/c++-faq/very-large-bytes.html. Is it possible and what is the use of that much amount of storage capacity for a byte?

blacktornado
  • 123
  • 1
  • 2
  • 6
  • In theory it's possible. In reality, finding a 64-bit char just isn't going to happen. I've heard that *very* early in the process of developing Cray's first C compiler, they had one that had a 64-bit `char`, but but by the time they released it, they'd apparently fixed that. – Jerry Coffin Jan 20 '13 at 02:28
  • @blacktomado - There are also older systems where the word size wasn't a multiple of 8 bits, and the language committee likes to allow existing implementations on such systems. See [Exotic architectures the standard committed cares about](http://stackoverflow.com/questions/6971886/exotic-architectures-the-standard-committee-cares-about) – Bo Persson Jan 20 '13 at 11:10

2 Answers2

8

The point isn't the usefulness of a big-byte "per se", but the fact that, for the standard, a byte is the smallest addressable quantity on the system1; if a system cannot address its memory in units smaller than 64 bits, then char will be 64 bit.

Obviously it's almost impossible to find such strange stuff on modern general-purpose computers, these weirdnesses come out on very specialized hardware (I heard DSPs are particularly prone to this kind of stuff), usually for performance reasons.

You can see more about this in this other FAQ.


  1. As far as is larger than 8 bits and is capable of holding a basic set of characters (alphanumeric characters plus some symbol, IIRC).
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Is it correct that `sizeof(char) == 1` is guaranteed, but incrementing a `char*` advances the memory by the "real size" of a `char` on that system? Or also by `1`, even if a "byte" is bigger? – leemes Jan 20 '13 at 02:30
  • 2
    @leemes: "1" what? It increments the pointer to the next addressable memory location, since that's the very definition of char. Also, sizeof(char) is 1 by definition, since sizeof measures stuff using char as the measurement unit. – Matteo Italia Jan 20 '13 at 02:33
  • 3
    Incrementing a `char *` always makes it point to the next character and also advances it by one. So if a `char` is 16-bits instead of 8-bits, then adding one to a pointer makes it point to the next 16-bit character instead of to the next 8-bit character. That will still be an increment of 1. – David Schwartz Jan 20 '13 at 02:35
  • This means I can't read the raw memory of an object byte for byte while assuming a byte has 8 bits... If I want to read it raw, I have to read the size of a byte for every increment of the `char*` pointer. – leemes Jan 20 '13 at 02:37
  • 1
    @leemes: not that it's particularly tragic, all the library functions on such a system would work with these "large" chars, so there would be virtually no difference in the code to read raw objects (unless you explicitly relied on having 8 bits per byte somewhere). Still, keep in mind that bigger-than-8-bit bytes are exceedingly rare - unless you are working on weird embedded platforms you don't have to worry about it. – Matteo Italia Jan 20 '13 at 02:40
  • @leemes: You pretty much can't make any assumptions when reading the raw memory for an object. Platforms are pretty much free to store objects in practically any format internally. – David Schwartz Jan 20 '13 at 18:06
  • @DavidSchwartz So? I might want to read them from memory and write them to disk, just as an example. My program (on the same machine) can then read it back (keyword "external algorithms / data structures"). So I don't have to care about the layout, but I do have to care about the byte size. (But maybe I don't even have to care about the byte size in this case, as the file will *also* operate on "C++ bytes", not on "8-bits-bytes". – leemes Jan 20 '13 at 19:11
  • @leemes: There's no rule that says a machine can't switch binary representations whenever it wants to so long as the representation remains constant during a run of a program. Also, a machine could use 9-bit bytes in memory but 8-bit bytes in a file, requiring translation to write raw memory to and from a file. You are making assumptions -- if we're in the world where `char` isn't necessarily 8 bits, none of these assumptions are reliable. – David Schwartz Jan 20 '13 at 19:16
  • @DavidSchwartz Thanks for pointing this out. However, I don't think that I will ever write external algorithms for non-8-bit-char platforms like DSPs. Not after knowing these problems. – leemes Jan 20 '13 at 19:39
4

The key define you are looking for is CHAR_BIT which is only guaranteed in C99(and correspondingly modern C++ systems) to be >=8. POSIX requires CHAR_BIT to == 8.

As mentioned, the real reason it is not just fixed at 8 for all systems are DSPs, which simplify their addressing mechanisms for speed to only allow one alignment for the architecture word size. Most modern DSPs are 16 or 32 bit, but I'd imagine some are 64-bit as well.

If you actually have code where this matters, you can use CHAR_BIT to compute 8-bit chunks from bytes, which should optimize out on CHAR_BIT==8 platforms.

David Perek
  • 141
  • 3