What is the difference between literals and variables in C (signed vs unsigned short ints)?

Question

I have seen the following code in the book Computer Systems: A Programmer's Perspective, 2/E. This works well and creates the desired output. The output can be explained by the difference of signed and unsigned representations.

#include<stdio.h>
int main() {
    if (-1 < 0u) {
        printf("-1 < 0u\n");
    }
    else {
        printf("-1 >= 0u\n");
    }
    return 0;
}

The code above yields -1 >= 0u, however, the following code which shall be the same as above, does not! In other words,

#include <stdio.h>

int main() {

    unsigned short u = 0u;
    short x = -1;
    if (x < u)
        printf("-1 < 0u\n");
    else
        printf("-1 >= 0u\n");
    return 0;
}

yields -1 < 0u. Why this happened? I cannot explain this.

Note that I have seen similar questions like this, but they do not help.

PS. As @Abhineet said, the dilemma can be solved by changing short to int. However, how can one explains this phenomena? In other words, -1 in 4 bytes is 0xff ff ff ff and in 2 bytes is 0xff ff. Given them as 2s-complement which are interpreted as unsigned, they have corresponding values of 4294967295 and 65535. They both are not less than 0 and I think in both cases, the output needs to be -1 >= 0u, i.e. x >= u.

A sample output for it on a little endian Intel system:

For short:

-1 < 0u
u =
 00 00
x =
 ff ff

For int:

-1 >= 0u
u =
 00 00 00 00
x =
 ff ff ff ff

[Similar question](http://stackoverflow.com/questions/17312545/type-conversion-unsigned-to-signed-int-char). — Lundin, Oct 26 '15 at 07:13
C behaves in terms of *values*, not representations. All the stuff about 2's complement and ffff and 65535 etc. is irrelevant. — M.M, Oct 26 '15 at 07:21

Lundin · Accepted Answer · 2015-10-26T07:36:45.250

10

The code above yields -1 >= 0u

All integer literals (numeric constansts) have a type and therefore also a signedness. By default, they are of type int which is signed. When you append the u suffix, you turn the literal into unsigned int.

For any C expression where you have one operand which is signed and one which is unsiged, the rule of balacing (formally: the usual arithmetic conversions) implicitly converts the signed type to unsigned.

Conversion from signed to unsigned is well-defined (6.3.1.3):

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

For example, for 32 bit integers on a standard two's complement system, the max value of an unsigned integer is 2^32 - 1 (4294967295, UINT_MAX in limits.h). One more than the maximum value is 2^32. And -1 + 2^32 = 4294967295, so the literal -1 is converted to an unsigned int with the value 4294967295. Which is larger than 0.

When you switch types to short however, you end up with a small integer type. This is the difference between the two examples. Whenever a small integer type is part of an expression, the integer promotion rule implicitly converts it to a larger int (6.3.1.1):

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

If short is smaller than int on the given platform (as is the case on 32 and 64 bit systems), any short or unsigned short will therefore always get converted to int, because they can fit inside one.

So for the expression if (x < u), you actually end up with if((int)x < (int)u) which behaves as expected (-1 is lesser than 0).

edited Oct 26 '15 at 07:36

answered Oct 26 '15 at 07:29

Lundin

195,001
40
254
396

Thanks. This explains the case. However, I wonder why the designers have decided in this way? Do you have any ideas? – Ali Shakiba Oct 26 '15 at 07:38
2

@AliShakiba: When you encounter different operands, you can either decide you will convert both operands to `signed`, or both operands to `unsigned` before proceeding with comparison. Since `signed int` is the "default" type, using an `unsigned` literal for one of the operands implies you had a reason to specify this additional qualifier. And since C was designed to be "close to the hardware", it's natural to try to fit everything into a platform's native word size to be able to use appropriate instructions (which don't "mix" operand types and typically operate on word-size operands). – vgru Oct 26 '15 at 07:42
2

@AliShakiba The original rationale behind integer promotion was something like: if you have for example `char x = 200, char y=200;` and then do `x + y`, then the expression wouldn't overflow. Integer promotion is a type inconsistency in the C language though, and it has caused far more harm than good over the years, because silent implicit promotion bugs are _much_ harder to find than simple integer overflow bugs. Also the implicit type promotion rules are somewhat complex and there is therefore lots of C programmers who don't know how they work, which is unfortunate. – Lundin Oct 26 '15 at 07:44
@Lundin: The original rationale also noted that the majority of implementations would treat operations promotion of short unsigned types to signed in a fashion indistiguishable from promotion to unsigned, *even when the result was in the range `INT_MAX+1u` to `UINT_MAX`*, except in cases where the result was used in certain ways, and that almost certainly influenced the decision to make things promote as signed, since signed promotion is usually right in cases where the difference would have mattered in most existing implementations, but unsigned promotion would be right... – supercat Jul 14 '16 at 16:40
...in cases where the Standard imposed no requirements but existing implementations did the right thing. I doubt the authors of the Standard would have written the rules as they did if they expected that it would become fashionable for compilers targeting silent-wraparound platforms to sometimes treat code like `uint1 = ushort1*ushort2;` in wonky fashion when the product is in the range `INT_MAX+1u` to `UINT_MAX`. – supercat Jul 14 '16 at 16:42

Peter Cordes · Answer 2 · 2015-10-26T07:36:54.013

3

You're running into C's integer promotion rules.

Operators on types smaller than int automatically promote their operands to int or unsigned int. See comments for more detailed explanations. There is a further step for binary (two-operand) operators if the types still don't match after that (e.g. unsigned int vs. int). I won't try to summarize the rules in more detail than that. See Lundin's answer.

This blog post covers this in more detail, with a similar example to yours: signed and unsigned char. It quotes the C99 spec:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

You can play around with this more easily on something like godbolt, with a function that returns one or zero. Just look at the compiler output to see what ends up happening.

#define mytype short

int main() {
    unsigned mytype u = 0u;
    mytype x = -1;
    return (x < u);
}

edited Oct 26 '15 at 07:36

answered Oct 26 '15 at 07:05

Peter Cordes

328,167
45
605
847

That's a good point. However, as in the sample at the end of the question shows, `short` and `unsigned short` are each two bytes, but with different interpretations. Thanks for the link. – Ali Shakiba Oct 26 '15 at 07:12
@AliShakiba: The rules for integer promotion can be unintuitive. It *is* the reason why you get different results with short vars (both get promoted to int) vs. int vars (int can't represent all possible unsigned ints). – Peter Cordes Oct 26 '15 at 07:15
3

There are actually two sets of rules here: the "integer promotions" promote both `short` and `unsigned short` to `int` (on this platform), and the "usual arithmetic conversions" promote `int` to `unsigned` when it's compared against `unsigned`. Most operators will perform the "integer promotions" and then the "usual arithmetic conversions". The notable exceptions are the bit shift operators, which only do integer promotions. – Dietrich Epp Oct 26 '15 at 07:20
That's a good point. The extension rules in C for `unsigned` is to fill the extra new bits on the left by `0`s and for `signed` types by the `msb`. Hence, there are two cases: (1) If we first extend, then cast to unsigned and finally compare, then `0xff ff` is extended to `0xff ff ff ff` and then are compared as unsigned with `0`, leading to `-1 >= 0u`. (2) If we first cast to unsigned, then extend, `0xff ff` is extended to `0x 00 00 ff ff` and finally is compared with `0u` which I think shall be evaluated to `-1 >= 0u`. In both cases, it needs to be `-1 >= 0u`! I'm totally confused! – Ali Shakiba Oct 26 '15 at 07:21
1

Your answer suggests that the arguments are promoted because different types are given to `>`. However that is wrong. If the arguments were two shorts, then both are promoted to int still. The *integer promotions* occur first with `>` and most other binary operators: types smaller than `int` are promoted to `int`. Only then, if the types still differ, are further conversions necessary. – M.M Oct 26 '15 at 07:23
@M.M and Dietrich: thanks for the corrections. Updated my answer to try not to say anything wrong. – Peter Cordes Oct 26 '15 at 07:31

score 2 · Answer 3 · edited Oct 26 '15 at 07:28

Other than what you seem to assume , this is not a property of the particular width of the types, here 2 byte versus 4 bytes, but a question of the rules that are to be applied. The integer promotion rules state that short and unsigned short are converted to int on all platforms where the corresponding range of values fit into int. Since this is the case here, both values are preserved and obtain the type int. -1 is perfectly representable in int as is 0. So the test results in -1 is smaller than 0.

In the case of testing -1 against 0u the common conversion choses the unsigned type as a common type to which both are converted. -1 converted to unsigned is the value UINT_MAX, which is larger than 0u.

This is a good example, why you should never use "narrow" types to do arithmetic or comparison. Only use them if you have a sever size constraint. This will rarely be the case for simple variables, but mostly for large arrays where you can really gain from storing in a narrow type.

Storing data in narrow types in arrays is great. Loading values from the array into narrow local variables is not usually good, though. Loading array values into `int` local variables avoids having to worry about integer promotion rules; just the usual rules for signed vs. unsigned int. x86 at least has efficient instructions that sign-extend an int8_t or int16_t on the fly while loading from memory into a register. IDK about ARM or other important architectures. — Peter Cordes, Oct 26 '15 at 07:41

score 0 · Answer 4 · edited May 23 '17 at 12:04

0

0u is not unsigned short, it's unsigned int.

Edit:: The explanation to the behavior, How comparison is performed ?

As answered by Jens Gustedt,

This is called "usual arithmetic conversions" by the standard and applies whenever two different integer types occur as operands of the same operator.

In essence what is does

if the types have different width (more precisely what the standard calls conversion rank) then it converts to the wider type if both types are of same width, besides really weird architectures, the unsigned of them wins Signed to unsigned conversion of the value -1 with whatever type always results in the highest representable value of the unsigned type.

The more explanatory blog written by him could be found here.

edited May 23 '17 at 12:04

Community

1
1

answered Oct 26 '15 at 06:58

Abhineet

5,320
1
25
43

Thanks @Abhineet . That's right. However, I am curious why this happened? `-1` in 4 bytes is `0xff ff ff ff` and in 2 bytes is `0xff ff`. Given them as 2s-complement which are interpreted as `unsigned`, they have corresponding values of `4294967295` and `65535`. They both are not less than `0` and I think in both cases, the output needs to be `-1 >= 0u`, i.e. `x >= u`. – Ali Shakiba Oct 26 '15 at 07:05
1

This doesn't answer the question, and doesn't explain why `int` variables give a different result from `short` variables. – Peter Cordes Oct 26 '15 at 07:08

What is the difference between literals and variables in C (signed vs unsigned short ints)?

4 Answers4