48

The code below compiles, but has different behavior for the char type than for the int types.

In particular

   cout << getIsTrue< isX<int8>::ikIsX  >() << endl;
   cout << getIsTrue< isX<uint8>::ikIsX  >() << endl;
   cout << getIsTrue< isX<char>::ikIsX  >() << endl;

result in 3 instantiations of templates for three types: int8, uint8 and char. What gives?

The same is not true for ints: int and uint32 which result in the same template instantiation, and signed int another.

The reason seems to be that C++ sees char, signed char and unsigned char as three different types. Whereas int is the same as a signed int. Is this right or am I missing something?

#include <iostream>

using namespace std;

typedef   signed char       int8;
typedef unsigned char      uint8;
typedef   signed short      int16;
typedef unsigned short     uint16;
typedef   signed int        int32;
typedef unsigned int       uint32;
typedef   signed long long  int64;
typedef unsigned long long uint64;

struct TrueType {};
struct FalseType {};

template <typename T>
struct isX
{
   typedef typename T::ikIsX ikIsX;
};


// This  int==int32 is ambiguous
//template <>            struct isX<int  >    { typedef FalseType ikIsX; };  // Fails
template <>            struct isX<int32  >  { typedef FalseType ikIsX; };
template <>            struct isX<uint32 >  { typedef FalseType ikIsX; };


// Whay isn't this ambiguous? char==int8
template <>            struct isX<char  >  { typedef FalseType ikIsX; };
template <>            struct isX<int8  >  { typedef FalseType ikIsX; };
template <>            struct isX<uint8 >  { typedef FalseType ikIsX; };


template <typename T> bool getIsTrue();
template <>           bool getIsTrue<TrueType>() { return true; }
template <>           bool getIsTrue<FalseType>() { return false; }

int main(int, char **t )
{
   cout << sizeof(int8) << endl;  // 1
   cout << sizeof(uint8) << endl; // 1
   cout << sizeof(char) << endl;  // 1

   cout << getIsTrue< isX<int8>::ikIsX  >() << endl;
   cout << getIsTrue< isX<uint8>::ikIsX  >() << endl;
   cout << getIsTrue< isX<char>::ikIsX  >() << endl;

   cout << getIsTrue< isX<int32>::ikIsX  >() << endl;
   cout << getIsTrue< isX<uint32>::ikIsX  >() << endl;
   cout << getIsTrue< isX<int>::ikIsX  >() << endl;

}

I'm using g++ 4.something

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
user48956
  • 14,850
  • 19
  • 93
  • 154
  • 1
    You should also note that there is no guarantee that `int8_t` is going to be a `signed char` and `uint8_t` is going to be an `unsigned char`. In particular, on Solaris `int8_t` is just `char` if `char` is signed. In other words, your code will fail to compile there. – Michał Górny Oct 11 '16 at 15:05
  • 2
    "int and uint32 which result in the same template instantiation, and signed int another" this should definitely be the other way around as int is signed. – Felix Dombek Mar 07 '18 at 13:24

4 Answers4

71

Here is your answer from the standard:

3.9.1 Fundamental types [basic.fundamental]

Objects declared as characters (char) shall be large enough to store any member of the implementation's basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values. Characters can be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (basic.types); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types. In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Greg Rogers
  • 35,641
  • 17
  • 67
  • 94
  • In summary the declaration determines how to interpret any one bit pattern, and the implementation determines the method of casting between them. So while they are all an equal number of bits, the `char` does not represent a number and any math performed either involves either implicit casting to signed/unsigned or some algorithmic behavior that only superficially appears as math. (An analogy might be using operator+ overloaded to modify structs.) Char is not required to be 8bit or one byte(Some machines do not use 8 bit bytes), In such cases 8bit math will require types int8 or uint8. – Max Power Jun 14 '22 at 03:08
  • 1
    Especially when you get into encodings like utf and the upper unicode set, you want consistent behavior based on the represented character independent of the underlying character set encoding. As such implicitly treating a char type as a numerical int is bad design that creates ambiguity, the multiple uses of "char". `unsigned char` and `signed char` should be deprecated in favor of explicit int8/uint8 for numerical purposes and plain `char` explicitly reserved for character enumeration. Especially considering the size of `char` can change to suit various character sets. – Max Power Jun 14 '22 at 03:29
32

While most integral types like short and int default to being signed, char does not have a default signage in C++.

It is neither the type signed char nor unsigned char, so implementations may decide whether it is signed.

It's a common mistake that C++ programmers run into when they use char as an 8 bit integer type.

Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
  • 3
    +1 because you very succinctly explain the differences in the data types and imply how they should be used by comparison. –  Jun 23 '17 at 13:31
  • 2
    Historical footnote: I heard this was because early versions of C didn't specify the signedness of `char`, so different compilers did different things, and then the standard preserved this behavior so that old code would keep working on their same compilers. – Mooing Duck Sep 29 '20 at 18:31
  • Implementation does not decide if char is signed or unsigned. `Char` is neither because it is not a numerical representation, it is only character enumeration, which is interpreted to some arbitrary character code set. What is implementation defined is how a `char` primitive can be implicitly or explicitly cast to and from a numerical data representation. The use of unsigned and signed char are numerical but should be deprecated in favor of int8 and uint8 to avoid the confusion of the poor type name and char is not required to be an 8bit byte, further undermining numerical utility. – Max Power Jun 14 '22 at 03:39
29

For questions such as this, i like to look into the Rationale document for C, which often provides answers to C++ mysteries as well, that sometimes arise for me when reading the Standard. It has this to say about it:

Three types of char are specified: signed, plain, and unsigned. A plain char may be represented as either signed or unsigned, depending upon the implementation, as in prior practice. The type signed char was introduced to make available a one-byte signed integer type on those systems which implement plain char as unsigned. For reasons of symmetry, the keyword signed is allowed as part of the type name of other integral types.

Rationale for C

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • So, why do we need `signed char`? Just in order to use it to represent a one-byte signed integer? – Alcott Aug 16 '12 at 01:46
  • 3
    @Alcott i think `char` might be signed, or might be unsigned, which is implementation defined, but `signed char` is always signed, and `unsigned char` is always unsigned, if you want to be sure/explicit of the type – hanshenrik Aug 08 '17 at 16:15
20

that's correct, char, unsigned char and signed char are separate types. It probably would have been nice if char was just a synonym for either signed char or unsigned char depending on your compilers implementation, but the standard says they are separate types.

Evan Teran
  • 87,561
  • 32
  • 179
  • 238