21

The C standard states:

ISO/IEC 9899:1999, 6.2.5.15 (p. 49)

The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

And indeed gcc define that according to target platform.

My question is, why does the standard do that? I can see nothing that can come out of ambiguous type definition, except of hideous and hard to spot bugs.

More than so, in ANSI C (before C99), the only byte-sized type is char, so using char for math is sometimes inevitable. So saying "one should never use char for math" is not so true. If that was the case, a saner decision was to include three types "char,ubyte,sbyte".

Is there a reason for that, or is it just some weird backwards-compatibility gotcha, in order to allow bad (but common) compilers to be defined as standard compatible?

Elazar Leibovich
  • 32,750
  • 33
  • 122
  • 169
  • [Any compiler which takes 'char' as 'unsigned'?](http://stackoverflow.com/q/3728045/995714) – phuclv Feb 20 '16 at 03:27
  • [Why don't the C or C++ standards explicitly define char as signed or unsigned?](https://stackoverflow.com/q/15533115/995714). Because on some architectures unsigned char is more efficient and on some others the reverse is true – phuclv Dec 19 '19 at 00:16

5 Answers5

26

"Plain" char having unspecified signed-ness allows compilers to select whichever representation is more efficient for the target architecture: on some architectures, zero extending a one-byte value to the size of "int" requires less operations (thus making plain char 'unsigned'), while on others the instruction set makes sign-extending more natural, and plain char gets implemented as signed.

  • Yup, whatever the hardware provides should be available directly to the language, with minimum sticky sugar on it. – dkretz May 27 '09 at 08:08
  • 9
    Then why not repeat the same story for unsigned/signed short? it should also be extended to int. – Elazar Leibovich May 27 '09 at 09:48
  • @ElazarLeibovich This is an insightful comment, but it is more common to sidestep the issue entirely by making `short` the same size as `int` (e.g. both 16-bit) than it is to make `char` the same size as `int`, although both are allowed by the C standards and both exist in the wild. And signedness of `char` does not seem important the way signedness of `short` does, making the compromise seem more acceptable. – Pascal Cuoq Aug 29 '12 at 13:55
  • @ElazarLeibovich: See my answer below; the character set may force `char` to be unsigned, or the relative ranges of `char` and `int` may force `char` to be signed. – supercat Jan 25 '14 at 04:07
12

Perhaps historically some implementations' "char" were signed and some were unsigned, and so to be compatible with both they couldn't define it as one or the other.

newacct
  • 119,665
  • 29
  • 163
  • 224
  • 6
    Correct. In the current world where every processer is either x86, Power or Sparc its difficult to relaise that in hte 70s there were dozens of processers available with different architectures. From the elegant simpilicity of 8 bit DECs to monster burroughs 36 bit behemoths. Not even the size of a character was agreed on - XEROX machines worked on a 6 bit character set. – James Anderson May 27 '09 at 08:10
  • Why would the machine care about character? Was there CPU command to output characters? I know none such thing in x86. – Elazar Leibovich May 27 '09 at 09:49
  • 1
    Yes, the reason was historical. And then, since we had unsigned char/signed char/plain char - for symmetry reasons we also have signed int/short - even though the signed for other integer types is redundant. Thus, there is principally the intend to have the sign-ness well defined, but it can't happen anymore for char - too much code would break – Johannes Schaub - litb May 27 '09 at 10:48
6

in those good old days C was defined, the character world was 7bit, so the sign-bit could be used for other things (like EOF)

Peter Miehle
  • 5,984
  • 2
  • 38
  • 55
0

On some machines, a signed char would be too small to hold all the characters in the C character set (letters, digits, standard punctuation, etc.) On such machines, 'char' must be unsigned. On other machines, an unsigned char can hold values larger than a signed int (since char and int are the same size). On those machines, 'char' must be signed.

supercat
  • 77,689
  • 9
  • 166
  • 211
0

I suppose (out of the top of my head) that their thinking was along the following lines:

If you care about the sign of char (using it as a byte) you should explicitly choose signed or unsigned char.

hasen
  • 161,647
  • 65
  • 194
  • 231