9

I have found that the C99 standard have a statement which denies the compatibility between the type char and the type signed char/unsigned char.

Note 35 of C99 standard:

CHAR_MIN, defined in limits.h, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

My question is that why does the committee deny the compatibility? What is the rationale? If char is compatible with signed char or unsigned char, will something terrible happen?

junwanghe
  • 187
  • 1
  • 7
  • See http://stackoverflow.com/questions/914242/why-is-chars-sign-ness-not-defined-in-c?rq=1 – Vaughn Cato Oct 07 '12 at 14:22
  • 2
    The problem is that if `char` is made compatible with one of them on your machine, it would be compatible with the other one on some other systems. It all depends on the underlying hardware. The committee decided to make it three different types as a compromise. – Bo Persson Oct 07 '12 at 14:36
  • @Vaughn Cato It's related but not same. I don't understand the meaning of compatibility. – junwanghe Oct 08 '12 at 06:37
  • @BoPersson Your comment is really helpful. I'm not very clear about the meaning of compatibility. Could you explain what is compatible and the reason for putting forward this concept? For basic types, is compatible equal to same? – junwanghe Oct 08 '12 at 08:47
  • @junwanghe - A plain `char` has the same representation and values as *either* `signed char` or `unsigned char`. For efficiency reasons, C uses whatever the underlying hardware provides. That means a `char` will be slightly different on different computers. A choice was made to make the char types three different types, even though two of them are otherwise exactly the same (but not the same two on all systems). For other types, like `int`, `signed int` and `int` *are* the same type, it is just `char` that is special. – Bo Persson Oct 08 '12 at 09:03
  • @BoPersson I knew these. I don't understand the concept of compatibility. For example, gcc in x86, char is defined to have the same range, representation, and behavior as signed char. What is the difference between char is compatible with signed char and not? – junwanghe Oct 08 '12 at 10:21
  • @BoPersson: While the representation of `char` must match that of either `signed char` or `unsigned char`, and all character types must be alias-compatible with each other, that does not mean that a `char**` will be in any way compatible with `signed char**` or `unsigned char**`. I don't think there's any good reason why a quality implementation wouldn't make them compatible, but some implementations value "optimization" over sanity. – supercat Jan 06 '17 at 20:01
  • @junwanghe For an implementation where `char` is signed, `char` and `signed char` are distinct types, but you can freely assign between them (there's an implicit conversion). There is no implicit conversion for pointer types (other than special cases involving `void*` and null pointer constants), so `char*` and `signed char*` cannot be assigned to each other without a cast. Making then incompatible lets the compiler diagnose errors that could be a real problem if you port your code to a system where plain `char` is unsigned. – Keith Thompson Jul 07 '21 at 19:15
  • As if Visual Studio 2019, Microsoft's C compiler has a known bug where it treats `char` and `signed char` as the same type. (It's C++ compiler doesn't have this bug.) https://developercommunity2.visualstudio.com/t/_Generic-char-signed-char-unsigned-cha/1228885?preview=true MIcrosoft's response is that "Our teams prioritize action on product issues with broad customer impact". – Keith Thompson Jul 07 '21 at 19:16

2 Answers2

11

The roots are in compiler history. There were (are) essentially two C dialects in the Eighties:

  1. Where plain char is signed
  2. Where plain char is unsigned

Which of these should C89 have standardized? C89 chose to standardize neither, because it would have invalidated a large number of assumptions made in C code already written--what standard folks call the installed base. So C89 did what K&R did: leave the signedness of plain char implementation-defined. If you required a specific signedness, qualify your char. Modern compilers usually let you chose the dialect with an option (eg. gcc's -funsigned-char).

The "terrible" thing that can happen if you ignore the distinction between (un)signed char and plain char is that if you do arithmetic and shifts without taking these details into account, you might get sign extensions when you don't expect them or vice versa (or even undefined behavior when shifting).

There's also some dumb advice out there that recommends to always declare your chars with an explicit signed or unsigned qualifier. This works as long as you only work with pointers to such qualified types, but it requires ugly casts as soon as you deal with strings and string functions, all of which operate on pointer-to-plain-char, which is assignment-incompatible without a cast. Such code suddenly gets plastered with tons of ugly-to-the-bone casts.

The basic rules for chars are:

  • Use plain char for strings and if you need to pass pointers to functions taking plain char
  • Use unsigned char if you need to do bit twiddling and shifting on bytes
  • Use signed char if you need small signed values, but think about using int if space is not a concern
Jens
  • 69,818
  • 15
  • 125
  • 179
  • 2
    I know that why there are three character types and their different responsibilities. But in a certain implementation, char is defined to have the same range, representation, and behavior as either signed char or unsigned char. Now the range, representation, and behavior are all same, why does char not be compatible with signed char/unsigned char? What is the meaning of compatibility? – junwanghe Oct 08 '12 at 08:25
  • The "terrible thing" that happened to me (and brought me here) was that I had no idea why my code neither used the template code for signed char nor for unsigned char when used with a char argument... – Eike Aug 08 '16 at 09:12
  • There are a number of easy ways the Standard could have been improved. An obvious one here would be to require implementations to select and document one of four treatments for `char`: as an alias for `signed char`, as an alias for `unsigned char`, as a signed type incompatible with `signed char`, or as an unsigned type incompatible with `unsigned char`. On many systems, the first two choices would be easier to implement than the latter two, and I can't see any useful purpose served by forbidding them. – supercat Jan 06 '17 at 20:06
2

Think of signed char and unsigned char as the smallest arithmetic, integral types, just like signed short/unsigned short, and so forth with int, long int, long long int. Those types are all well-specified.

On the other hand, char serves a very different purpose: It's the basic type of I/O and communication with the system. It's not meant for computations, but rather as the unit of data. That's why you find char used in the command line arguments, in the definition of "strings", in the FILE* functions and in other read/write type IO functions, as well as in the exception to the strict aliasing rule. This char type is deliberately less strictly defined so as to allow every implementation to use the most "natural" representation.

It's simply a matter of separating responsibilities.

(It is true, though, that char is layout-compatible with both signed char and unsigned char, so you may explicitly convert one to the other and back.)

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • You can see `char` kind of like `byte` without any integer type associated with it. – Sergey L. Oct 07 '12 at 14:16
  • @SergeyL.: Well, by definition `char` is the smallest addressable unit, so it is indeed what you would call a "byte". Just bear in mind that its number of bits is not fixed (though at least eight). – Kerrek SB Oct 07 '12 at 14:18