2

Background

Several C++ source materials and stack overflow questions talk about the implementation dependent nature of char. That is, char in C++ may be defined as either an unsigned char or a signed char, but this implementation depends entirely on the compiler according to the ARM Linux FAQ:

The above code is actually buggy in that it assumes that the type "char" is equivalent to "signed char". The C standards do say that "char" may either be a "signed char" or "unsigned char" and it is up to the compilers implementation or the platform which is followed.

This leaves the door open for both ambiguity issues and bad practices including mistaking the signage of a char when used as an 8-bit number. The Rationale for C offers some reason for why this is the case, but does not address the issue of leaving open the possibility for ambiguity:

Three types of char are specified: signed, plain, and unsigned. A plain char may be represented as either signed or unsigned, depending upon the implementation, as in prior practice. The type signed char was introduced to make available a one-byte signed integer type on those systems which implement plain char as unsigned. For reasons of symmetry, the keyword signed is allowed as part of the type name of other integral types.

It would seem advantageous to close the door to even the potential of ambiguity to leave only the types of unsigned char and signed char as the two data types for the 8-bit unit. This prompted me to ask the question...

Question

Given the potential for ambiguity, why leave the char data type implementation dependent?

isakbob
  • 1,439
  • 2
  • 17
  • 39
  • 2
    char types are a mess in C++. They serve 3 completely different purposes: character in a string, byte and integer, with no way to disambiguate between them at the type system. Try to `cout` a `std::int8_t` ... yeah ... – bolov Sep 12 '19 at 01:32
  • @bolov although your comment is conversational, it lead me to post this comment as clarification: Your comment is the premise of why I asked this question. – isakbob Sep 12 '19 at 01:35
  • 5
    Some processors prefer signed char, and others prefer unsigned char. For example, POWER can load an 8-bit value from memory with zero extension, but not sign extension. But SuperH-3 can load an 8-bit value from memory with sign extension but not zero extension. C++ derives from C, and C leaves many details of the language implementation-defined so that each implementation can be tailored to be most efficient for its target environment. – Raymond Chen Sep 12 '19 at 01:38
  • 1
    @RaymondChen that should be an answer – bolov Sep 12 '19 at 01:48
  • 1
    @RaymondChen I have made your comment a community wiki answer per the suggestion of bolov. – isakbob Sep 12 '19 at 01:52
  • Most of those liberties are given to allow better performance in some architecture, but indeed make more difficult to write portable code (which would not profits of that liberty). It has been a bad choice IMO. – Jarod42 Sep 12 '19 at 01:54
  • `int8_t` should have been a non-character extended integer type, but the horse has bolted on that one – M.M Sep 12 '19 at 02:13
  • 1
    Remember that plain `char` has the same representation as either `signed char` or `unsigned char`, but they are nevertheless three distinct and incompatible types. – Keith Thompson Sep 12 '19 at 02:27

1 Answers1

5

Some processors prefer signed char, and others prefer unsigned char. For example, POWER can load an 8-bit value from memory with zero extension, but not sign extension. But SuperH-3 can load an 8-bit value from memory with sign extension but not zero extension. C++ derives from C, and C leaves many details of the language implementation-defined so that each implementation can be tailored to be most efficient for its target environment.

isakbob
  • 1,439
  • 2
  • 17
  • 39
  • 3
    Importantly, when you're using `char` as actual character data, the sign doesn't matter (your weirdo ASCII superset's glyphs can be referenced with negative values just as easily as with positive values). So plain `char` using whichever type is more efficient is fine in that case. It's only when you use it for math that the implementation defined signedness is an issue. In that case you should explicitly specify signedness, or just use the stdint types like `uint8_t`/`int8_t` to make it clear you're relying on the numeric behavior, not just storing characters. – ShadowRanger Sep 12 '19 at 01:56