In modern C, a char
is guaranteed to be independently modifiable, without disturbing surrounding data. It's usually chosen to be the width of the narrowest load/store instruction. So on Alpha or word-addressable CPUs, a char
had to be the word size, or else every char
store would have to compile to an atomic RMW on the containing word. (Rather than a much cheaper non-atomic RMW like some early compilers actually used, before C11 introduces a thread-aware memory model to the language.) See Can modern x86 hardware not store a single byte to memory? (which covers modern ISAs in general) and C++ memory model and race conditions on char arrays for the requirements C++11 and C11 place on char
.
But that Wikipedia table of word and char sizes in historical machines is clearly not about that, given the sizes. (e.g. smaller than a word on some word-addressable machines, I'm pretty sure).
It's about how software (and character I/O hardware like terminals) packed multiple character of the machine's native character encoding (e.g. a subset of ASCII, EBCDIC, or something earlier) into machine words.
Unicode, and variable-length character encodings like UTF-8 and UTF-16, are recent inventions compared to that history. https://en.wikipedia.org/wiki/Character_encoding#History
Many systems used fewer than 8 bits per character, e.g. 6 (64 unique encodings) is enough for the upper and lower case Latin alphabet plus some special characters and control codes.
These historical character sets are what motivated some of the choices for programming languages to use certain special characters or not, because they were developed on systems that had a certain character set.
Historical machines really did do things like pack 3 characters of text into an 18-bit word.
You might want to search on https://retrocomputing.stackexchange.com/, or even ask a question there after doing some more reading.