2

When comparing an int to an unsigned int, for example:

signed int si = -1;
unsigned int ui = 0;

if (ui > si)
{
    // Do something
}

si will be converted to an unsigned int and so it will be bigger than ui. So why is this even allowed if the result will not be as expected, is it made this way for historical reasons, and if they had to do it again they wouldn't allow it?

  • 1
    This is one reason you shouldn't mix signed and unsigned. – NathanOliver Jan 27 '15 at 21:16
  • 1
    I can only assume the default integer compare operators compare by bits for performance instead of signed/unsigned value. This was probably done this way for performance reasoning. I'll let a guru take a crack at explaining since I'm interested too. – PDizzle745 Jan 27 '15 at 21:17
  • If C++ converts an unsigned int to a signed int, the problem still happens. e.g. `signed int si = 0; unsigned int ui = (unsigned)-1; if(ui > si) { // Do something}` – imlyc Jan 27 '15 at 21:18
  • 1
    See the [answer](http://stackoverflow.com/a/5416498/1062948) to [this](http://stackoverflow.com/q/5416414/1062948) question. – crayzeewulf Jan 27 '15 at 21:22
  • 1
    with `clang -Weverything` you will get a warning for such a comparison. – Walter Jan 27 '15 at 21:35
  • According to Bjarne Stroustrup himself: use `int` unless you have a reason not to! – notadam Jan 27 '15 at 21:36

4 Answers4

5

C++ has the following rules for deciding the type to which the two values are converted after integer promotions have been done (chapter 5, clause 9):

  • If both operands have the same type, no further conversion is needed.
  • Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
  • Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
  • Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
  • Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

The last rule applies here, because both int and unsigned int have the same rank.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
2

This rule exists because it is the best solution to the problem.

You can't compare apples and oranges. The only options are:

  • convert orange to apple
  • convert apple to orange
  • convert both to some other common type

Out of the first two options, converting both to unsigned makes more sense than converting both to signed.

How about the third option? I suppose a possibility would be to convert both values to long and then do the comparison. This might seem like a good idea at first, but if you think about it some more then there are some problems:

  • If long and int are the same size then this doesn't actually help
  • Even if long is bigger than int, you have just moved the problem off to the case of comparing a long with an unsigned long .
  • It would be harder to write portable code.

The last point is important. The historical rules about short and char being promoted to int are actually extremely annoying when you are writing template code or code with overloaded functions, because it changes which overload is called.

We would not want to introduce any more rules of the same type (e.g. promote int to long if it is in comparison with unsigned int but only if sizeof(long) > sizeof(int) yada yada yada).

M.M
  • 138,810
  • 21
  • 208
  • 365
  • But why allowing the implicit conversion in the first place if it could cause unexpected results, do other (newly created) programming languages allow for this kind of behavior? –  Jan 27 '15 at 21:55
  • Why does converting both to unsigned make more sense than converting both to signed? – Benjamin Lindley Jan 27 '15 at 21:55
  • @BenjaminLindley well, converting to signed invokes implementation-defined behaviour and converting to unsigned doesn't. Since converting to signed doesn't actually have any other benefit compared to converting to unsigned, this suggests that converting to unsigned is better. – M.M Jan 27 '15 at 22:00
  • @mahmoud_t1 It's only unexpected if you didn't learn from a good learning reference :) C++ is strongly typed; when looking at *any* expression, keep in mind what the type is of each sub-expression, and also keep in mind that the type must be known at compile-time. – M.M Jan 27 '15 at 22:01
  • I did not mean "unexpected" as in I wouldn't know that this behavior will happen, I meant that you as the programmer shouldn't have to worry about how data are represented, you just know that this variable contains the value `0` and that variable contains the value `-1` and the compiler should handle comparing what each variable represents. –  Jan 27 '15 at 22:06
  • But converting to signed does have a benefit. And it is demonstrated in the OP's question. The counter argument is that extremely large unsigned values would compare incorrectly. But small negative numbers are much more common than extremely large positive numbers. As far as signed to unsigned invoking implementation defined behavior, that is a bit of a circular argument, since we are asking why the language is defined a certain way. The conversion could easily have been made well defined behavior. – Benjamin Lindley Jan 27 '15 at 22:09
  • unsigned to signed invokes implementation-defined , not the other way around. This is because there are different ways of representing signed numbers in hardware, but the same is not true for unsigned. – M.M Jan 27 '15 at 23:19
  • I wouldn't say that giving the "expected" result in OP's case is a benefit. Programs have to work properly in all cases. It might result in code that a naive programmer thinks is correct but then breaks when large numbers are applied. This way, the programmer ends up learning the language rules at an early state. – M.M Jan 27 '15 at 23:24
1

The reason is mostly historical. C++ is big on being compatible with C code even today. You can take a C code-base and convert it verbatim to C++ and it will probably work, even though there are some minor differences and incompatibilities. C has defined it that way and C++ will not change it, because otherwise it would change the meaning of code and therefore break programs that would otherwise work.

In the current working draft (N4296) you can find the rules in section 5.10.5.

Ralph Tandetzky
  • 22,780
  • 11
  • 73
  • 120
0

There are only two choices for the language:

  • treat the signed as unsigned
  • treat the unsigned as signed

As dasblinkenlight says, the language mandates the former. The reason is that it makes the code simpler. In modern machines, the top bit is the sign bit, and the hardware can perform either a signed or an unsigned comparison, so the code is just a compare followed by an unsigned conditional jump.

To treat the unsigned as signed, the compiler could throw away (mask out) the top bit in the unsigned word ui and then perform a signed test, but that would change its value. Alternatively it could test the top bit of ui first and return greater if set, then perform the masking above.

Bottom line, the language choice was made because it's more code-efficient.

Phil Mayes
  • 31
  • 1
  • 1
  • 4
  • Or it could do no masking, and simply do a signed comparison. This would leave things in a state similar to what is the case now. That is, having incorrect comparisons for a certain subset of numbers. And I would argue that the incorrect comparisons that would result from converting unsigned to signed (that is, comparisons with extremely large positive numbers) are more acceptable than the incorrect comparisons that occur now (that is, comparisons with common negative numbers like -1). – Benjamin Lindley Jan 27 '15 at 22:28
  • @Benjamin You could, and on reflection, I agree with you, on the grounds that numbers generally cluster around zero, so the majority of cases would evaluate correctly. – Phil Mayes Jan 28 '15 at 00:56