14

Consider following example:

#include <stdio.h>

int main(void)
{
    unsigned char a  = 15; /* one byte */
    unsigned short b = 15; /* two bytes */
    unsigned int c   = 15; /* four bytes */

    long x = -a; /* eight bytes */
    printf("%ld\n", x);

    x = -b;
    printf("%ld\n", x);

    x = -c;
    printf("%ld\n", x);

    return 0;
}

To compile I am using GCC 4.4.7 (and it gave me no warnings):

gcc -g -std=c99 -pedantic-errors -Wall -W check.c

My result is:

-15
-15
4294967281

The question is why both unsigned char and unsigned short values are "propagated" correctly to (signed) long, while unsigned int is not ? Is there any reference or rule on this ?

Here are results from gdb (words are in little-endian order) accordingly:

(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001    11111111111111111111111111111111 

(gdb) x/2w &x
0x7fffffffe168: 11111111111111111111111111110001    00000000000000000000000000000000
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137

5 Answers5

12

This is due to how the integer promotions applied to the operand and the requirement that the result of unary minus have the same type. This is covered in section 6.5.3.3 Unary arithmetic operators and says (emphasis mine going forward):

The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.

and integer promotion which is covered in the draft c99 standard section 6.3 Conversions and says:

if an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.

In the first two cases, the promotion will be to int and the result will be int. In the case of unsigned int no promotion is required but the result will require a conversion back to unsigned int.

The -15 is converted to unsigned int using the rules set out in section 6.3.1.3 Signed and unsigned integers which says:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)

So we end up with -15 + (UMAX + 1) which results in UMAX - 14 which results in a large unsigned value. This is sometimes why you will see code use -1 converted to to an unsigned value to obtain the max unsigned value of a type since it will always end up being -1 + UMAX + 1 which is UMAX.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
3

int is special. Everything smaller than int gets promoted to int in arithmetic operations.

Thus -a and -b are applications of unary minus to int values of 15, which just work and produce -15. This value is then converted to long.

-c is different. c is not promoted to an int as it is not smaller than int. The result of unary minus applied to an unsigned int value of k is again an unsigned int, computed as 2N-k (N is the number of bits).

Now this unsigned int value is converted to long normally.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • 1
    Or rather, everything smaller than int is special. Those are called the small integer types. – Lundin Jun 02 '14 at 13:58
3

This behavior is correct. Quotes are from C 9899:TC2.

6.5.3.3/3:

The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.

6.2.5/9:

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

6.3.1.1/2:

The following may be used in an expression wherever an int or unsigned int may be used:

  • An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int.

  • A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

So for long x = -a;, since the operand a, an unsigned char, has conversion rank less than the rank of int and unsigned int, and all unsigned char values can be represented as int (on your platform), we first promote to type int. The negative of that is simple: the int with value -15.

Same logic for unsigned short (on your platform).

The unsigned int c is not changed by promotion. So the value of -c is calculated using modular arithmetic, giving the result UINT_MAX-14.

Community
  • 1
  • 1
aschepler
  • 70,891
  • 9
  • 107
  • 161
  • 1
    If I understand correctly, then unary arithmetic operator (minus) triggers let's call it the "initial integer promotion" (as it's not binary operator, so there is no second operand for further comparison) and assignment has nothing to do with that (it's more like next step after expression (r-value) is calculated). – Grzegorz Szpetkowski Jun 02 '14 at 16:30
2

C's integer promotion rules are what they are because standards-writers wanted to allow a wide variety of existing implementations that did different things, in some cases because they were created before there were "standards", to keep on doing what they were doing, while defining rules for new implementations that were more specific than "do whatever you feel like". Unfortunately, the rules as written make it extremely difficult to write code which doesn't depend upon a compiler's integer size. Even if future processors would be able to perform 64-bit operations faster than 32-bit ones, the rules dictated by the standards would cause a lot of code to break if int ever grew beyond 32 bits.

It would probably in retrospect have been better to have handled "weird" compilers by explicitly recognizing the existence of multiple dialects of C, and recommending that compilers implement a dialect that handles various things in consistent ways, but providing that they may also implement dialects which do them differently. Such an approach may end up ultimately being the only way that int can grow beyond 32 bits, but I've not heard of anyone even considering such a thing.

I think the root of the problem with unsigned integer types stems from the fact that they are sometimes used to represent numerical quantities, and are sometimes used to represent members of a wrapping abstract algebraic ring. Unsigned types behave in a manner consistent with an abstract algebraic ring in circumstances which do not involve type promotion. Applying a unary minus to a member of a ring should (and does) yield a member of that same ring which, when added to the original, will yield zero [i.e. the additive inverse]. There is exactly one way to map integer quantities to ring elements, but multiple ways exist to map ring elements back to integer quantities. Thus, adding a ring element to an integer quantity should yield an element of the same ring regardless of the size of the integer, and conversion from rings to integer quantities should require that code specify how the conversion should be performed. Unfortunately, C implicitly converts rings to integers in cases where either the size of the ring is smaller than the default integer type, or when an operation uses a ring member with an integer of a larger type.

The proper solution to solve this problem would be to allow code to specify that certain variables, return values, etc. should be regarded as ring types rather than numbers; an expression like -(ring16_t)2 should yield 65534 regardless of the size of int, rather than yielding 65534 on systems where int is 16 bits, and -2 on systems where it's larger. Likewise, (ring32)0xC0000001 * (ring32)0xC0000001 should yield (ring32)0x80000001 even if int happens to be 64 bits [note that if int is 64 bits, the compiler could legally do anything it likes if code tries to multiply two unsigned 32-bit values which equal 0xC0000001, since the result would be too large to represent in a 64-bit signed integer.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Thank you for interesting answer. From what I already learned C99 has `` header, which contains exact-width aliases for existing types like `uint8_t`, `uint16_t` and so on. This greatly improves portability and as I guess it allows to force ring-complied behaviour by explicit casting, e.g. `(uint16_t)-2` should always evaluate to `65534`. – Grzegorz Szpetkowski Jun 02 '14 at 19:02
  • 1
    @GrzegorzSzpetkowski: The intention of `stdint.h` was to improve portability, but the default promotion rules mean that code which uses types like `uint16_t` but isn't cognizant of the size of `int` is likely to fail in subtle ways if the size of `int` changes. The only way to write portable code is to ensure that both operands of any operator get cast to the same type, and any time the result of operator acting on unsigned values is used as the operand of a divide, right-shift, or relational operator it is explicitly cast to that same unsigned type. In other words... – supercat Jun 02 '14 at 19:15
  • ...portable code must always ensure that it doesn't matter what type promotion rules the compiler uses. From the standpoint of portability, C's type promotion rules are no more useful than would be a rule which required that all binary operators receive operands of matching type, and all binary operators used on types which could be smaller than `int` have their result cast to match their operand type [which is what truly portable code ends up being forced to do anyway]. – supercat Jun 02 '14 at 19:19
  • @GrzegorzSzpetkowski bear in mind that `uint16_t` etc. are most likely just typedefs for other types so there is no new behavour; you still have to contend with the integer promotions. It's not really an issue once you get used to it, as you should be used to thinking about the types of sub-expressions in order to correctly deal with things like using `<` between a signed and unsigned. – M.M Jun 04 '14 at 03:25
  • @MattMcNabb: Unfortunately, the way the C rules are defined, there's no fixed promotion schedule among the various "fixed-sized" integers. The sum of an integer literal and a `uint32_t`, for example, is required to be a signed value in some compilers and an unsigned value in others. – supercat Jun 04 '14 at 04:27
  • @supercat Yeah - you need to know about this and code in such a way that it doesn't matter which the result is – M.M Jun 04 '14 at 04:29
  • @MattMcNabb: Is there any way to do that in general without having to add so many typecasts that the compiler might as well forbid most non-trivial expressions without typecasts? – supercat Jun 04 '14 at 04:32
  • @supercat if you think in terms of values and ranges rather than representations, things are clearer (well, they are for me anyway). If you want to negate in the range of uint64_t then do `x = -(uint64_t)val;`. – M.M Jun 04 '14 at 04:43
  • @MattMcNabb: While it's unlikely that `int` would be larger than 64 bits, the aforementioned expression would be erroneous if it were, unless `x` was type `uint64_t`, in which case the assignment would properly perform a modulo-based conversion. – supercat Jun 04 '14 at 04:45
  • If int were larger than 64bit then this expression would still give the desired result . The idea is that `x` is `uint64_t` here. – M.M Jun 04 '14 at 04:46
0

Negatives are tricky. Especially when it comes to unsigned values. If you look at the c-documentation, you'll notice that (contrary to what you'd expect) unsigned chars and shorts are promoted to signed ints for computing, while an unsigned int will be computed as an unsigned int.

When you compute the -c, the c is treated as an int, it becomes -15, then is stored in x, (which still believes it is an UNSIGNED int) and is stored as such.

For clarification - No ACTUAL promotion is done when "negativeing" an unsigned. When you assign a negative to any type of int (or take a negative) the 2's compliment of the number is instead used. Since the only practical difference between unsigned and signed values is that the MSB acts as a sign flag, it is taken as a very large positive number instead of a negative one.

Happington
  • 454
  • 2
  • 8
  • First para OK, second para not: in `-c` the `c` is not treated as an `int` - it stays as `unsigned int` the whole time. The behaviour of `-` on unsigned integral types is defined without involving any conversions to signed or anything. – M.M Jun 04 '14 at 03:22
  • Perhaps I wasn't clear, I'll update my answer to reflect this. What I meant was "The operation affects the binary as if it was a signed value" I.E. 2's compliments whatever you have. It's defined, in that "it doesn't really care what the data is, it'll 2's compliment it." – Happington Jun 04 '14 at 12:12