13

What actually belongs to the "character type" in C11 — besides char of course?

To be more precise, the special exceptions for the character type (for example that any object can be accessed by an lvalue expression of character type — see §6.5/7 in C11 standard), to which concrete types do they apply? They seem to apply to uint8_t and int8_t from stdint.h, but is this guaranteed? On the other hand gcc doesn't regard char16_t from uchar.h as a "character type".

viuser
  • 953
  • 6
  • 19
  • Also `signed char` and `unsigned char`. – Kerrek SB Aug 08 '16 at 08:38
  • Note that `int8_t` and `uint8_t` are just aliases for existing types. – Kerrek SB Aug 08 '16 at 08:38
  • There have been serious proposals to base `int8_t` and `uint8_t` on extended integer types, functionally identical to `signed char` and `unsigned char` respectively _except_ that they would not count as "character types" for §6.5/7. As far as I know, no implementation has carried through this idea, but I'm not aware of any reason it's forbidden, either. (The advantage of this would be, for instance, that you could now have string pointers that _didn't_ alias all the other pointers in the program.) – zwol Aug 08 '16 at 13:16
  • @zwol Do you mean by using `std::basic_string`, etc.? – underscore_d Aug 08 '16 at 13:38
  • @underscore_d Essentially yes. It would be awkward on account of all the library functions that expect plain `char*` and/or `std::string`, but it could be done. I suspect careful use of `restrict` gets you at least 90% of the benefit, though. – zwol Aug 08 '16 at 14:30

2 Answers2

8

Only char, signed char and unsigned char1.

The types uint8_t, int8_t, char16_t, or any type in the form intN_t or charN_t, may or may not be synonyms for a character type.


1(Quoted from: ISO/IEC 9899:201x 6.2.5 Types 15)
The three types char, signed char, and unsigned char are collectively called the character types.

2501
  • 25,460
  • 4
  • 47
  • 87
  • If `sizeof(char) == sizeof(int8_t)` then `int8_t` *is* a character type? – viuser Aug 08 '16 at 08:51
  • @wolf-revo-cats No, that is not guaranteed. It could be theoretically typedefed as an extended integer type. – 2501 Aug 08 '16 at 08:55
  • 1
    @2501 The `intN_t` types are optional and cannot exist on a machine that does not have an exactly `N`-bit type. And since `char` types must be the smallest addressible unit on a machine, and must be at least 8 bits... if a machine supports the `int8_t` types, then they must be aliases to `[[un]signed] char`. Or have I missed a logical loophole somewhere? I guess some exotic machine could offer types with the same width but different signages or bit representations... making them more suited to one or the other of `char` or `intN_t`. – underscore_d Aug 08 '16 at 12:37
  • 2
    @underscore_d The point 2501 was making is that a system could provide types with the same size and representation but that are "technically" different types (in that signatures for functions taking one type will not match the other, etc) – Random832 Aug 08 '16 at 13:27
7

char, signed char, and unsigned char are the character types in C11. This is the same since C89.

Treating int8_t (or uint8_t) as a character type has many problems.

  1. They are optional.
  2. They may not exist if CHAR_BIT > 8.
  3. Defined to work if the implementation uses 2's complement representation (which is the most common). But they are other representations, namely 1's complement and sign-magnitude defined/allowed by the C standard.

Since they are, if they exist, typedef'ed to an existing type, you can probably get away with using int8_t or uint8_t as a character type in practice. But the standard doesn't guarantee anything and there's no reason to treat them as such anyway when you have the real character types.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
P.P
  • 117,907
  • 20
  • 175
  • 238
  • 1
    I previously used the `cstdint` typedefs but stopped and went back to good old `char`s for precisely the reasons you gave. It just makes sense, states intent properly as per the special allowances given to `char`s, and protects against potential accidents later (on some exotic implementation). – underscore_d Aug 08 '16 at 12:39
  • I would like to ask, if there actually exists an implementation, where `char` is larger than one byte? If you look for example at GNU libc's memset at the lines `cccc = (unsigned char) c; cccc |= cccc << 8; cccc |= cccc << 16;` this *would* break if `char` is larger than one byte. So with GNU is seems implicitly to be guaranteed that `char` is one byte large. – viuser Aug 09 '16 at 00:06
  • 1
    See: [What platforms have something other than 8-bit char?](http://stackoverflow.com/questions/2098149/what-platforms-have-something-other-than-8-bit-char) for some examples. Yes, that'd break glibc. But glibc typically uses arch-specific assembly code for memset, memcpy etc. So, that code might not be the one *actually* used. Besides, CHAR_BIT!=8 allowance is mainly for DSPs. But you can reasonably assume CHAR_BIT=8 on most desktop systems and POSIX requires CHAR_BIT to be exactly 8 bits. But, the standard covers a lot of other systems as well. – P.P Aug 09 '16 at 08:20