I am learning C++ using the books listed here. In particular, I learnt that
The signedness of
char
depends on the compiler and the target platform
This means that on one implementation/platform char
might be signed
and in another it might be unsigned
. In other words, we cannot portably write char ch = 228;
because the system in which char
is signed
228
is out of range. For example, if you see this demo you'll see that we get a warning in clang.
Then I was surprised to learn that the type of '\xe4'
is char
and not unsigned char
. I was surprised because \xe4
corresponds to 228
which will be out of range for a system in which char
is signed
. So I expected the type of '\xe4'
to be an unsigned char
.
Thus, my question is why did the standard choose to define the type of '\xe4'
to be char
instead of unsigned char
. I mean \xe4
is in range of unsigned char
but out of range for char
(in a system where char is signed). So it seems natural/intuitive to me that unsigned char
should've been used as the type of '\xe4'
so that it won't have platform/implementation dependence.
Note
Note that I am trying to make sense of what is happening here and my current understanding might be wrong. I was curious about this and so have asked this question to clear my concept further, as I've just started learning C++.
Note also that my question is not about whether we can portably write char ch = 228;
but instead that why is the type of '\xe4'
chosen to be char
inplace of unsigned char
.
Summary
Why is the type of a character literal char
, even when the value of the literal falls outside the range of char
? Wouldn't it make more sense to allow the type to be unsigned char
where the value fits that range?