What platforms have something other than 8-bit char?

Question

Every now and then, someone on SO points out that char (aka 'byte') isn't necessarily 8 bits.

It seems that 8-bit char is almost universal. I would have thought that for mainstream platforms, it is necessary to have an 8-bit char to ensure its viability in the marketplace.

Both now and historically, what platforms use a char that is not 8 bits, and why would they differ from the "normal" 8 bits?

When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?

In the past I've come across some Analog Devices DSPs for which char is 16 bits. DSPs are a bit of a niche architecture I suppose. (Then again, at the time hand-coded assembler easily beat what the available C compilers could do, so I didn't really get much experience with C on that platform.)

The CDC Cyber series had a 6/12 bit encoding. The most popular characters were 6 bits. The remaining characters used 12 bits. — Thomas Matthews, Jan 20 '10 at 00:07
I'm sure there are some platforms that have non 8-bit chars but in 15 years coding including working with custom hardware through to games consoles, I've never encountered one yet. Still time though.... — zebrabox, Jan 20 '10 at 00:09
The PDP-11 nailed it down. The notion that a character can be encoded in a char is seriously obsolete. — Hans Passant, Jan 20 '10 at 01:38
"The PDP-11 nailed it down" -- You mean because C was first implemented for the PDP-11 with 8 bit bytes? But C was next implemented for Honeywell machines with 9 bit bytes. See K&R version 1. Also, the question asked about char (i.e. byte) not about character (one or more bytes encoding something that wasn't asked about). — Windows programmer, Jan 20 '10 at 03:40
DEC-10 and DEC-20 had 36-bit words. Five 7-bit ASCII characters per word was quite common. Also six 6-bit characters were used. — David R Tribble, Jan 20 '10 at 17:12
I've seen compilers specifically designed for microcontrollers, where you could specify the size of char in the compiler options. — vsz, Feb 20 '16 at 08:23
@vsz: Can you say specifically which compilers for which microcontrollers? — Craig McQueen, Feb 20 '16 at 09:52
@CraigMcQueen : If I remember correctly, CodeVision for Atmel microcontrollers lets one choose the size of char — vsz, Feb 20 '16 at 10:10

Steve Jessop · Accepted Answer · 2012-07-06T13:54:15.257

103

char is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.

Another consideration is that POSIX mandates CHAR_BIT == 8. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char, that's their bad luck.

In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT. If you want an exact 8 bit type, use int8_t. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.

edited Jul 06 '12 at 13:54

answered Jan 20 '10 at 01:22

Steve Jessop

273,490
39
460
699

2

TI C62xx and C64xx DSPs also have 16-bit chars. (uint8_t isn't defined on that platform.) – myron-semack Jan 20 '10 at 02:35
8

Many DSPs for audio processing are 24-bit machines; the [BelaSigna](http://www.onsemi.com/PowerSolutions/parametrics.do?id=2210) DSPs from On Semi (after they bought AMI Semi); the [DSP56K/Symphony Audio](http://www.freescale.com/webapp/sps/site/homepage.jsp?code=563XXGPDSP&tid=prodlib) DSPs from Freescale (after they were spun off from Motorola). – David Cary Jul 06 '12 at 13:52
3

@msemack C64xx has hardware for 8/16/32/40, and 8bit char – user3528438 Apr 16 '15 at 20:45
@user3528438 It did not at the time I posted that. Code Composer Studio 3.x, there was no uint8_t in stdint.h. – myron-semack Apr 17 '15 at 12:30
5

Rather than `assert()` (if that's what you meant), I'd use `#if CHAR_BIT != 8` ... `#error "I require CHAR_BIT == 8"` ... `#endif` – Keith Thompson Oct 02 '15 at 20:52
There have been 9 bit UNIX boxes for ages. Lots of the old standards talk about 9 bit bytes, CHAR_BIT notwithstanding. – Joshua Dec 01 '15 at 23:11
2

@KeithThompson Is there any reason not to use `static_assert()`? – Qix - MONICA WAS MISTREATED Feb 17 '17 at 04:35
2

@Qix: Portability. IIRC `static_assert` was only added to the C standard in 2011. – Keith Thompson Feb 17 '17 at 06:14
1

Regarding TIs hardware: `uint8_t` is most definitely defined: https://github.com/energia/c2000-core/blob/master/cores/c2000/F2802x_Device.h (line 112) (`typedef unsigned char uint8_t; `) This workaround made a library i was using compile, but then it broke down at runtime :'( – Lanting Mar 02 '18 at 08:18
The C6000 DSP family _always_ had `CHAR_BIT = 8`. I was at TI before C62x debuted, and worked on the product family (or closely adjacent families) throughout my career there. The `uint8_t` type didn't show up until C99, and so it wasn't part of the tool chain until they added C99 support. C62x debuted in 1997. I don't remember when we added C99 support but it wasn't 1999, I'm sure. – Joe Z Dec 18 '21 at 02:03

John Feminella · Answer 2 · 2010-01-20T14:50:25.107

41

When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?

It's not so much that it's "worth giving consideration" to something as it is playing by the rules. In C++, for example, the standard says all bytes will have "at least" 8 bits. If your code assumes that bytes have exactly 8 bits, you're violating the standard.

This may seem silly now -- "of course all bytes have 8 bits!", I hear you saying. But lots of very smart people have relied on assumptions that were not guarantees, and then everything broke. History is replete with such examples.

For instance, most early-90s developers assumed that a particular no-op CPU timing delay taking a fixed number of cycles would take a fixed amount of clock time, because most consumer CPUs were roughly equivalent in power. Unfortunately, computers got faster very quickly. This spawned the rise of boxes with "Turbo" buttons -- whose purpose, ironically, was to slow the computer down so that games using the time-delay technique could be played at a reasonable speed.

One commenter asked where in the standard it says that char must have at least 8 bits. It's in section 5.2.4.2.1. This section defines CHAR_BIT, the number of bits in the smallest addressable entity, and has a default value of 8. It also says:

Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

So any number equal to 8 or higher is suitable for substitution by an implementation into CHAR_BIT.

edited Jan 20 '10 at 14:50

answered Jan 20 '10 at 00:19

John Feminella

303,634
46
339
357

6

I haven't seen a Turbo button in at least 20 years - do you really think it's germane to the question? – Mark Ransom Jan 20 '10 at 03:30
31

@Mark Ransom: That's the whole point. Developers often rely on assumptions which seem to be true at the moment, but which are much shakier than they initially appear. (Can't count the number of times I've made _that_ mistake!) The Turbo button should be a painful reminder not to make unnecessary assumptions, and certainly not to make assumptions that aren't guaranteed by a language standard as if they were immutable facts. – John Feminella Jan 20 '10 at 03:33
2

Could you point out to place in C++ Standard which says that the bye has at least 8 bits? It is a common belief however I personally failed to find it in the Standard. The only thing I found in Standard is which characters must be representable by `char` there are more then 64 of them but less that 128 so 7 bits would be enough. – Adam Badura Jan 20 '10 at 06:48
6

Section 18.2.2 invokes the C standard for it. In the C standard it's section 7.10 and then section 5.4.2.4.1. Page 22 in the C standard. – Windows programmer Jan 21 '10 at 03:48
2

So other answers and comments mention machines with 5 bit, 6 bit and 7 bit bytes. Does that mean that you cannot run a C program on that machine that complies with the standard? – Jerry Jeremiah Feb 07 '18 at 01:30
About assumptions and video games: even games made in 2000 are often 1) assuming the existence of cycles~time correlation and 2) assuming the code is executed by a single threaded CPU. Try running vanilla version of "Deus Ex" (without delay/render/affinity patches) on modern PC to see how things break. – Sergey.quixoticaxis.Ivanov Jun 10 '18 at 16:19
@JerryJeremiah from C++11 ISO **[intro.memory]**: _A byte is at least large enough to contain any member of the basic execution character set and the **eight-bit** code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation defined._ I don't know about C. – Sergey.quixoticaxis.Ivanov Jun 10 '18 at 16:21
2

@JerryJeremiah: You can run C on a machine whose hardware datum unit is less than 8 bits, but then a C "byte" will be multiple datum units. Your physical pointers will have a step size less than a byte, but the C program will never use that granularity. (And note there won't be any C data type for a sub-byte datum) – Ben Voigt Jan 17 '19 at 21:50
Saying that code that assumes that `char` is 8 bits is "violating the standard" isn't really accurate–the standard does not *mandate* that your code be portable. Your code is just implementation-specific but still valid; perhaps (purposefully) "ignorant of the C/C++ language standards". (If POSIX is the standard you care about, this is not just valid but guaranteed). – saagarjha Sep 02 '20 at 22:34

score 37 · Answer 3 · answered Jan 20 '10 at 00:20

37

Machines with 36-bit architectures have 9-bit bytes. According to Wikipedia, machines with 36-bit architectures include:

Digital Equipment Corporation PDP-6/10
IBM 701/704/709/7090/7094
UNIVAC 1103/1103A/1105/1100/2200,

answered Jan 20 '10 at 00:20

R Samuel Klatchko

74,869
16
134
187

7

Also Honeywell machines, such as maybe the second machine where C was implemented. See K&R version 1. – Windows programmer Jan 20 '10 at 03:44
5

Actually, the Dec-10 had also 6-bit characters - you could pack 6 of these into a 36-bit word (ex-Dec-10 programmer talking) – Jan 20 '10 at 14:52
2

The DEC-20 used five 7-bit ASCII characters per 36-bit word on the TOPS-20 O/S. – David R Tribble Jan 20 '10 at 17:19
1

As far as I remember, on the PDP-10 7 bits ASCII, packed 5 bytes to a word was the most common format for text files (dropping a bit, which when set was interpreted as an indication that the word was a line number in some contexts). The SIXBIT charset (a subset of ASCII, dropping the control and lower case columns) was used for some things (for instance in for names in object file) but not for text file as there was no way to indicate the end of lines... 9 bits characters was not of common use, excepted perhaps to port C programs to the PDP-10. – AProgrammer Jan 21 '10 at 12:51
3

That joke was actually implemented for supporting Unicode on this architecture. – Joshua Dec 14 '11 at 07:31
10

I imagine that the reason octal was ever actually used was because 3 octal digits neatly represent a 9-bit byte, just like we usually use hexadecimal today because two hexadecimal digits neatly represent an 8-bit byte. – bames53 Jul 11 '12 at 00:20
2

The PDP-6/PDP-10/DEC-10/DEC-20 did not have just 6-bit bytes, or 7-bit bytes, or 8-bit bytes, or 9-bit bytes. It had an arbitrary byte size from 1 to 36 bits. – Lars Brinkhoff May 07 '15 at 06:53

score 19 · Answer 4 · answered Jan 20 '10 at 00:38

19

A few of which I'm aware:

DEC PDP-10: variable, but most often 7-bit chars packed 5 per 36-bit word, or else 9 bit chars, 4 per word
Control Data mainframes (CDC-6400, 6500, 6600, 7600, Cyber 170, Cyber 176 etc.) 6-bit chars, packed 10 per 60-bit word.
Unisys mainframes: 9 bits/byte
Windows CE: simply doesn't support the `char` type at all -- requires 16-bit wchar_t instead

answered Jan 20 '10 at 00:38

Jerry Coffin

476,176
80
629
1,111

2

@ephemient:I'm pretty sure there was at least one (pre-standard) C compiler for the PDP-10/DecSystem 10/DecSystem 20. I'd be *very* surprised at a C compiler for the CDC mainframes though (they were used primarily for numeric work, so the Fortran compiler was the big thing there). I'm pretty sure the others do have C compilers. – Jerry Coffin Jan 20 '10 at 01:28
3

Did the Windows CE compiler really not support the `char` type at all? I know that the system libraries only supported the wide char versions of functions that take strings, and that at least some versions of WinCE removed the ANSI string functions like strlen, to stop you doing char string-handling. But did it really not have a char type at all? What was `sizeof(TCHAR)`? What type did malloc return? How was the Java `byte` type implemented? – Steve Jessop Jan 20 '10 at 01:33
1

@Steve:Well, it's been a while since I wrote any code for CE, so I can't swear to it, but my recollection is that even attempting to define a char variable leads to a compiler error. Then again, that *is* depending on my memory, which means it isn't exactly certain. – Jerry Coffin Jan 20 '10 at 01:35
1

How strange. And certainly not C. I worked at a company with a multi-platform product that included at least two versions of WinCE, but I never interacted much with Windows code, and the portable code in the product (that is, most of the product) wasn't compiled with Microsoft's compiler. – Steve Jessop Jan 20 '10 at 01:39
11

Windows CE supports char, which is a byte. See Craig McQueen's comment on Richard Pennington's answer. Bytes are needed just as much in Windows CE as everywhere else, no matter what sizes they are everywhere else. – Windows programmer Jan 20 '10 at 03:44
1

Huh, I thought C skipped over the PDP-10. But perhaps there was a port; all of this is before my time anyhow ;-) – ephemient Jan 20 '10 at 05:53
2

There are (were?) at least two implementations of C for the PDP-10: KCC and a port of gcc (http://pdp10.nocrew.org/gcc/). – AProgrammer Jan 21 '10 at 12:43
3

The C standard would not allow 7-bit chars packed 5 per 36-bit word (as you mentioned for the PDP-10), nor would it allow 6-bit chars, as you mentioned for the Control Data mainframes. See http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.6 – Ken Bloom Aug 07 '11 at 23:58
1

@Ken: Quite true -- such implementations definitely would not conform with the standard (but most, if not all of them, were obsolete before the first C standard in any case). – Jerry Coffin Aug 08 '11 at 01:29
2

@Jerry: BTW, I didn't mean to say you couldn't implement a C compiler on that hardware, just that you'd have to use different `char` sizes to do it. – Ken Bloom Aug 08 '11 at 03:55
I know about four C compilers for the PDP-10: C10, KCC, PCC, and GCC. – Lars Brinkhoff Mar 07 '17 at 07:12

score 16 · Answer 5 · answered Nov 28 '13 at 17:14

There is no such thing as a completely portable code. :-)

Yes, there may be various byte/char sizes. Yes, there may be C/C++ implementations for platforms with highly unusual values of CHAR_BIT and UCHAR_MAX. Yes, sometimes it is possible to write code that does not depend on char size.

However, almost any real code is not standalone. E.g. you may be writing a code that sends binary messages to network (protocol is not important). You may define structures that contain necessary fields. Than you have to serialize it. Just binary copying a structure into an output buffer is not portable: generally you don't know neither the byte order for the platform, nor structure members alignment, so the structure just holds the data, but not describes the way the data should be serialized.

Ok. You may perform byte order transformations and move the structure members (e.g. uint32_t or similar) using memcpy into the buffer. Why memcpy? Because there is a lot of platforms where it is not possible to write 32-bit (16-bit, 64-bit -- no difference) when the target address is not aligned properly.

So, you have already done a lot to achieve portability.

And now the final question. We have a buffer. The data from it is sent to TCP/IP network. Such network assumes 8-bit bytes. The question is: of what type the buffer should be? If your chars are 9-bit? If they are 16-bit? 24? Maybe each char corresponds to one 8-bit byte sent to network, and only 8 bits are used? Or maybe multiple network bytes are packed into 24/16/9-bit chars? That's a question, and it is hard to believe there is a single answer that fits all cases. A lot of things depend on socket implementation for the target platform.

So, what I am speaking about. Usually code may be relatively easily made portable to certain extent. It's very important to do so if you expect using the code on different platforms. However, improving portability beyond that measure is a thing that requires a lot of effort and often gives little, as the real code almost always depends on other code (socket implementation in the example above). I am sure that for about 90% of code ability to work on platforms with bytes other than 8-bit is almost useless, for it uses environment that is bound to 8-bit. Just check the byte size and perform compilation time assertion. You almost surely will have to rewrite a lot for a highly unusual platform.

But if your code is highly "standalone" -- why not? You may write it in a way that allows different byte sizes.

If one stores one octet per `unsigned char` value there should be no portability problems unless code uses aliasing tricks rather than shifts to convert sequences of octets to/from larger integer types. Personally, I think the C standard should define intrinsics to pack/unpack integers from sequences of shorter types (most typically `char`) storing a fixed guaranteed-available number of bits per item (8 per `unsigned char`, 16 per `unsigned short`, or 32 per `unsigned long`). — supercat, Jul 25 '15 at 19:42

score 9 · Answer 6 · answered Jan 20 '10 at 01:02

9

It appears that you can still buy an IM6100 (i.e. a PDP-8 on a chip) out of a warehouse. That's a 12-bit architecture.

answered Jan 20 '10 at 01:02

dmckee --- ex-moderator kitten

98,632
24
142
234

score 9 · Answer 7 · answered Jan 20 '10 at 03:18

9

Many DSP chips have 16- or 32-bit char. TI routinely makes such chips for example.

answered Jan 20 '10 at 03:18

Alok Singhal

93,253
21
125
158

petantik · Answer 8 · 2010-01-20T00:17:16.660

5

The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values. Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits

Quoted from http://en.wikipedia.org/wiki/Byte#History

Not sure about other languages though.

http://en.wikipedia.org/wiki/IBM_7030_Stretch#Data_Formats

Defines a byte on that machine to be variable length

edited Jan 20 '10 at 00:17

answered Jan 20 '10 at 00:08

petantik

1,072
8
12

1

"Not sure about other languages though" -- historically, most languages allowed the machine's architecture to define its own byte size. Actually historically so did C, until the standard set a lower bound at 8. – Windows programmer Jan 20 '10 at 03:45

PrgTrdr · Answer 9 · 2010-04-18T17:21:05.553

4

The DEC PDP-8 family had a 12 bit word although you usually used 8 bit ASCII for output (on a Teletype mostly). However, there was also a 6-BIT character code that allowed you to encode 2 chars in a single 12-bit word.

edited Apr 18 '10 at 17:21

answered Mar 10 '10 at 13:46

PrgTrdr

316
4
13

score 3 · Answer 10 · answered Aug 24 '12 at 07:46

what sort of consideration is it worth giving to platforms with non-8-bit char?

magic numbers occur e.g. when shifting;

most of these can be handled quite simply by using CHAR_BIT and e.g. UCHAR_MAX instead of 8 and 255 (or similar).

hopefully your implementation defines those :)

those are the "common" issues.....

another indirect issue is say you have:

struct xyz {
   uchar baz;
   uchar blah;
   uchar buzz; 
}

this might "only" take (best case) 24 bits on one platform, but might take e.g. 72 bits elsewhere.....

if each uchar held "bit flags" and each uchar only had 2 "significant" bits or flags that you were currently using, and you only organized them into 3 uchars for "clarity", then it might be relatively "more wasteful" e.g. on a platform with 24-bit uchars.....

nothing bitfields can't solve, but they have other things to watch out for ....

in this case, just a single enum might be a way to get the "smallest" sized integer you actually need....

perhaps not a real example, but stuff like this "bit" me when porting / playing with some code.....

just the fact that if a uchar is thrice as big as what is "normally" expected, 100 such structures might waste a lot of memory on some platforms..... where "normally" it is not a big deal.....

so things can still be "broken" or in this case "waste a lot of memory very quickly" due to an assumption that a uchar is "not very wasteful" on one platform, relative to RAM available, than on another platform.....

the problem might be more prominent e.g. for ints as well, or other types, e.g. you have some structure that needs 15 bits, so you stick it in an int, but on some other platform an int is 48 bits or whatever.....

"normally" you might break it into 2 uchars, but e.g. with a 24-bit uchar you'd only need one.....

so an enum might be a better "generic" solution ....

depends on how you are accessing those bits though :)

so, there might be "design flaws" that rear their head.... even if the code might still work/run fine regardless of the size of a uchar or uint...

there are things like this to watch out for, even though there are no "magic numbers" in your code ...

hope this makes sense :)

...what? Why do you think `enum` is likely to be smaller than other native types? Are you aware it defaults to the same storage as `int`? "you have some structure that needs 15 bits, so you stick it in an int, but on some other platform an int is 48 bits or whatever....." - so `#include ` and make it an `int16_t` for the best chance of minimising bit usage. I'm really not sure what you thought you were saying among all those ellipses. — underscore_d, Nov 22 '15 at 00:40

bta · Answer 11 · 2010-01-20T17:05:58.277

For one, Unicode characters are longer than 8-bit. As someone mentioned earlier, the C spec defines data types by their minimum sizes. Use sizeof and the values in limits.h if you want to interrogate your data types and discover exactly what size they are for your configuration and architecture.

For this reason, I try to stick to data types like uint16_t when I need a data type of a particular bit length.

Edit: Sorry, I initially misread your question.

The C spec says that a char object is "large enough to store any member of the execution character set". limits.h lists a minimum size of 8 bits, but the definition leaves the max size of a char open.

Thus, the a char is at least as long as the largest character from your architecture's execution set (typically rounded up to the nearest 8-bit boundary). If your architecture has longer opcodes, your char size may be longer.

Historically, the x86 platform's opcode was one byte long, so char was initially an 8-bit value. Current x86 platforms support opcodes longer than one byte, but the char is kept at 8 bits in length since that's what programmers (and the large volumes of existing x86 code) are conditioned to.

When thinking about multi-platform support, take advantage of the types defined in stdint.h. If you use (for instance) a uint16_t, then you can be sure that this value is an unsigned 16-bit value on whatever architecture, whether that 16-bit value corresponds to a char, short, int, or something else. Most of the hard work has already been done by the people who wrote your compiler/standard libraries.

If you need to know the exact size of a char because you are doing some low-level hardware manipulation that requires it, I typically use a data type that is large enough to hold a char on all supported platforms (usually 16 bits is enough) and run the value through a convert_to_machine_char routine when I need the exact machine representation. That way, the platform-specific code is confined to the interface function and most of the time I can use a normal uint16_t.

The question didn't ask about characters (whether Unicode or not). It asked about char, which is a byte. — Windows programmer, Jan 20 '10 at 03:41
Also, the execution character set has nothing to do with opcodes, it's the character set used at execution, think of cross-compilers. — ninjalj, Jul 08 '10 at 20:15
"Historically, the x86 platform's opcode was one byte long" : how sweet. *Historically*, C was developed on a PDP-11 (1972), long before x86 had been invented (1978). — Martin Bonner supports Monica, Mar 27 '18 at 07:54

score 2 · Answer 12 · answered Sep 24 '21 at 18:51

The weirdest one I saw was the CDC computers. 6 bit characters but with 65 encodings. [There were also more than one character set -- you choose the encoding when you install the OS.]

If a 60 word ended with 12, 18, 24, 30, 36, 40, or 48 bits of zero, that was the end of line character (e.g. '\n').

Since the 00 (octal) character was : in some code sets, that meant BNF that used ::= was awkward if the :: fell in the wrong column. [This long preceded C++ and other common uses of ::.]

score 1 · Answer 13 · answered Jan 20 '10 at 00:42

1

ints used to be 16 bits (pdp11, etc.). Going to 32 bit architectures was hard. People are getting better: Hardly anyone assumes a pointer will fit in a long any more (you don't right?). Or file offsets, or timestamps, or ...

8 bit characters are already somewhat of an anachronism. We already need 32 bits to hold all the world's character sets.

answered Jan 20 '10 at 00:42

Richard Pennington

19,673
4
43
72

2

True. The name `char` is a bit quaint now in Unicode days. I care more about 8-bit units (octets) when dealing with binary data, e.g. file storage, network communications. `uint8_t` is more useful. – Craig McQueen Jan 20 '10 at 00:48
3

Unicode never needed a full 32 bits, actually. They originally planned for 31 (see the original UTF-8 work), but now they're [content with only 21 bits](http://en.wikipedia.org/wiki/Unicode_plane). They probably realized they wouldn't be able to print the book any more if they actually needed all 31 bits :P – me22 Aug 30 '13 at 04:34
2

@me22, Unicode originally planned for 16 bits. "Unicode characters are consistently 16 bits wide, regardless of language..." Unicode 1.0.0. http://www.unicode.org/versions/Unicode1.0.0/ch01.pdf. – Shannon Severance Jan 18 '16 at 22:27
1

ISO 10646 was originally 31 bits, and Unicode merged with ISO 10646, so it might be sloppy to say that Unicode was 31 bits, but it's not really untrue. Note they don't actually print the full code tables any more. – prosfilaes Jan 24 '20 at 15:15

score 1 · Answer 14 · answered May 28 '22 at 18:17

The Univac 1100 series had two operational modes: 6-bit FIELDATA and 9-bit 'ASCII' packed 6 or 4 characters respectively into 36-bit words. You chose the mode at program execution time (or compile time.) It's been a lot of years since I actually worked on them.

What platforms have something other than 8-bit char?

14 Answers14

Linked

Related