Why do arithmetic operations on unsigned chars promote them to signed integers?

Question

Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters.

From my understanding an unsigned char does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two unsigned chars would produce unexpected behavior. Therefore, promoting them to int just seems like a waste of space (in most cases) and CPU cycles.

Moreover, why int? If a variable is being declared as unsigned, clearly the unsignedness is important to the programmer, therefore a promotion to an unsigned int would still make more sense than an int, in my opinion.

[EDIT #1] As stated out in the comments, promotion to unsigned int will take place if an int cannot sufficiently accommodate the value in the unsigned char.

[EDIT #2] To clarify the question, if it is about the performance benefit of operating over int than char, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over unsigned chars, therefore the requirement to promote them to ints, seems unnecessary. Can you give me an example which proves this wrong?

32/64 bit operations are faster in modern CPUs than 8 bit arithmetic. If the char is not an array, it's likely that it's already been stored as integer anyway. — Michael Chourdakis, May 27 '20 at 09:40
This might help http://www.idryman.org/blog/2012/11/21/integer-promotion/ — john, May 27 '20 at 09:41
What sort of answer do you want? (Note that the standard doesn't specify promotion of unsigned char to int, and may promote to unsigned int given suitable definitions of char and int). — Paul Hankin, May 27 '20 at 09:41
@john that article contains many errors. It's completely misguided in many regards. — Paul Hankin, May 27 '20 at 09:43
@MichaelChourdakis Fair enough, but shouldn't that be left upto the compiler to optimize rather than making it part of the standard? What if, I try to compile this kind of a statement on a machine where int = 16 bits? — DashwoodIce9, May 27 '20 at 09:44
For example if unsigned char and unsigned int both have 32 bits, unsigned char will promote to unsigned int, not int. — Paul Hankin, May 27 '20 at 09:47
@PaulHankin But how can an unsigned char ever have 32 bits? Isn't it the shortest primitive data type, set to a length of 8 bits? No more, no less than 8 bits, exactly that. — DashwoodIce9, May 27 '20 at 09:49
@DashwoodIce9: `char` is one byte, but one byte might be 32 bits (in future system). — Jarod42, May 27 '20 at 09:51
Did you check for yourself whether chars must be 8 bits before disagreeing? It seems not so hard to check, either in the standard itself or online. — Paul Hankin, May 27 '20 at 09:56
@PaulHankin, apologies. Yes, I did not check that. I was somehow under the delusion that chars are always 8 bits wide. Some research and Jarod's comment cleared that up. Thanks. I've updated the question accordingly. — DashwoodIce9, May 27 '20 at 10:59
The only thing I have to add is that a `char` is precisely `CHAR_BIT` bits wide. I'll concede that on most modern architectures `CHAR_BIT == 8` is true, but you should not assume that to hold universally (now, in the past, or the future). — Nelewout, May 27 '20 at 11:10
For C++, its [implicit type promotion rules](https://stackoverflow.com/a/46073296/4641116) were inherited from C. — Eljay, May 27 '20 at 11:24

Bob__ · Accepted Answer · 2020-05-27T12:04:36.743

You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003).

Chapter 6.3 (p. 44 - 45) is about conversions

Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.

The difference between these approaches centered on the treatment of unsigned char and unsigned short when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1).

The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.

The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int.

Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:

An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an int or “narrower” type.

The result of the preceding expression is used in a context in which its signedness is significant:

• sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or

• it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or

• it is either operand of /, %, <, <=, >, or >=.

In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.

One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.

The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.

QUIET CHANGE IN C89

A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice.

For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.

Thank you Bob. The linked document clears up all doubts as to why implicit promotion to `int` takes place rather than `unsigned int`. I, however, still do not see the **need** to promote it in the first place. Lundin's answer mentions _"The harsh reality caused by the integer promotions means that almost no operation in C can be carried out on small types like `char` or `short`. Operations are always carried out on `int` or larger types."_ He then says that the compiler is allowed to optimize the code. I didn't get how it is being optimized though. Could you clarify that? — DashwoodIce9, May 27 '20 at 12:47
@DashwoodIce9 I belive it was a matter of performance expectations. See e.g. https://stackoverflow.com/questions/5069489/performance-of-built-in-types-char-vs-short-vs-int-vs-float-vs-double or https://stackoverflow.com/questions/5347042/are-char-and-small-int-slower-than-int . Having the calculations performed on types of the CPU word size should require less cycles. Now, with SIMD instructions set, it's a bit more complicated. — Bob__, May 27 '20 at 13:09
@DashwoodIce9 -- re: optimization -- under the "as if" rule, the compiler is allowed to do things differently if the result is the same as it would have gotten by exactly following the rules. — Pete Becker, May 27 '20 at 16:23
Since this is tagged C++, I wanted to point out that the C++ standard ([cppreference link](https://en.cppreference.com/w/cpp/language/operator_arithmetic)) says something different -- adding two `unsigned short`s _should_ return an `unsigned short`, even though every compiler I've tried converts it to an `int`. — Spencer, Nov 19 '21 at 14:37
@Spencer I'm not sure where you read that. That link says: *"If the operand passed to an arithmetic operator is integral or unscoped enumeration type, then before any other action [...], the operand undergoes integral promotion."*. Following that link: *"In particular, arithmetic operators do not accept types smaller than `int` as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable. [...] `unsigned char` or `unsigned short` can be converted to `int` if it can hold its entire value range, and `unsigned int` otherwise;"* — Bob__, Nov 19 '21 at 15:22
@Bob__ Hmph. I didn't look at the integral promotion page; the language in the operator arithmetic page implies that the smaller types survive, `short` being mentioned as a separate conversion rank. — Spencer, Nov 19 '21 at 15:36

Why do arithmetic operations on unsigned chars promote them to signed integers?

1 Answers1

Linked