Kind of late but I can't resist this one. Predicting the future is tough. Predicting the future of computers can be more hazardous to your code than premature optimization.
Short Answer
While I end this post with how 9-bit systems handled portability with 8-bit bytes this experience also makes me believe 9-bit byte systems will never arise again in general purpose computers.
My expectation is that future portability issues will be with hardware having a minimum of 16 or 32 bit access making CHAR_BIT at least 16.
Careful design here may help with any unexpected 9-bit bytes.
QUESTION to /. readers: is anyone out there aware of general purpose CPUs in production today using 9-bit bytes or one's complement arithmetic? I can see where embedded controllers may exist, but not much else.
Long Answer
Back in the 1990s's the globalization of computers and Unicode made me expect UTF-16, or larger, to drive an expansion of bits-per-character: CHAR_BIT in C. But as legacy outlives everything I also expect 8-bit bytes to remain an industry standard to survive at least as long as computers use binary.
BYTE_BIT: bits-per-byte (popular, but not a standard I know of)
BYTE_CHAR: bytes-per-character
The C standard does not address a char consuming multiple bytes. It allows for it, but does not address it.
3.6 byte: (final draft C11 standard ISO/IEC 9899:201x)
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment.
NOTE 1: It is possible to express the address of each individual byte of an object uniquely.
NOTE 2: A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit.
Until the C standard defines how to handle BYTE_CHAR values greater than one, and I'm not talking about “wide characters”, this the primary factor portable code must address and not larger bytes. Existing environments where CHAR_BIT is 16 or 32 are what to study. ARM processors are one example. I see two basic modes for reading external byte streams developers need to choose from:
- Unpacked: one BYTE_BIT character into a local character. Beware of sign extensions.
- Packed: read BYTE_CHAR bytes into a local character.
Portable programs may need an API layer that addresses the byte issue. To create on the fly and idea I reserve the right to attack in the future:
#define BYTE_BIT 8 // bits-per-byte
#define BYTE_CHAR (CHAR_BIT/BYTE_BIT) //bytes-per-char
size_t byread(void *ptr,
size_t size, // number of BYTE_BIT bytes
int packing, // bytes to read per char
// (negative for sign extension)
FILE *stream);
size_t bywrite(void *ptr,
size_t size,
int packing,
FILE *stream);
size
number BYTE_BIT bytes to transfer.
packing
bytes to transfer per char character. While typically 1 or BYTE_CHAR it could indicate BYTE_CHAR of the external system, which can be smaller or larger than the current system.
- Never forget endianness clashes.
Good Riddance To 9-Bit Systems:
My prior experience with writing programs for 9-bit environments lead me to believe we will not see such again, unless you happen to need a program to run on a real old legacy system somewhere. Likely in a 9-bit VM on a 32/64-bit system. Since year 2000 I sometimes make a quick search for, but have not seen, references to current current descendants of the old 9-bit systems.
Any, highly unexpected in my view, future general purpose 9-bit computers would likely either have an 8-bit mode, or 8-bit VM (@jstine), to run programs under. The only exception would be special purpose built embedded processors, which general purpose code would not likely to run on anyway.
In days of yore one 9-bit machine was the PDP/15. A decade of wrestling with a clone of this beast make me never expect to see 9-bit systems arise again. My top picks on why follow:
- The extra data bit came from robbing the parity bit in core memory. Old 8-bit core carried a hidden parity bit with it. Every manufacturer did it. Once core got reliable enough some system designers switched the already existing parity to a data bit in a quick ploy to gain a little more numeric power and memory addresses during times of weak, non MMU, machines. Current memory technology does not have such parity bits, machines are not so weak, and 64-bit memory is so big. All of which should make the design changes less cost effective then the changes were back then.
- Transferring data between 8-bit and 9-bit architectures, including off-the-shelf local I/O devices, and not just other systems, was a continuous pain. Different controllers on the same system used incompatible techniques:
- Use the low order 16-bits of 18 bit words.
- Use the low-order 8 bits of 9-bit bytes where the extra high-order bit might be set to the parity from bytes read from parity sensitive devices.
- Combine the low-order 6 bits of three 8-bit bytes to make 18 bit binary words.
Some controllers allowed selecting between 18-bit and 16-bit data transfers at run time. What future hardware, and supporting system calls, your programs would find just can't be predicted in advance.
- Connecting to the 8-bit Internet will be horrid enough by itself to kill any 9-bit dreams someone has. They got away with it back then as machines were less interconnected in those times.
- Having something other than an even multiple of 2 bits in byte-addressed storage brings up all sorts of troubles. Example: if you want an array of thousands of bits in 8-bit bytes you can
unsigned char bits[1024] = { 0 }; bits[n>>3] |= 1 << (n&7);
. To fully pack 9-bits you must do actual divides, which brings horrid performance penalties. This also applies to bytes-per-word.
- Any code not actually tested on 9-bit byte hardware may well fail on it's first actual venture into the land of unexpected 9-bit bytes, unless the code is so simple that refactoring it in the future for 9-bits is only a minor issue. The prior byread()/bywrite() may help here but it would likely need an additional CHAR_BIT mode setting to set the transfer mode, returning how the current controller arranges the requested bytes.
To be complete anyone who wants to worry about 9-bit bytes for the educational experience may need to also worry about one's complement systems coming back; something else that seems to have died a well deserved death (two zeros: +0 and -0, is a source of ongoing nightmares... trust me). Back then 9-bit systems often seemed to be paired with one's complement operations.