c++ illogical >= comparison when dealing with vector.size() most likely due to size_type being unsigned

Question

I could use a little help clarifying this strange comparison when dealing with vector.size() aka size_type

vector<cv::Mat> rebuiltFaces;
int rebuildIndex = 1;
cout << "rebuiltFaces size is " << rebuiltFaces.size() << endl;

while( rebuildIndex >= rebuiltFaces.size() ) {
    cout << (rebuildIndex >= rebuiltFaces.size()) << " , " << rebuildIndex << " >= " << rebuiltFaces.size() << endl;
    --rebuildIndex;
}

And what I get out of the console is

rebuiltFaces size is 0
1 , 1 >= 0
1 , 0 >= 0
1 , -1 >= 0
1 , -2 >= 0
1 , -3 >= 0

If I had to guess I would say the compiler is blindly casting rebuildIndex to unsigned and the +- but is causing things to behave oddly, but I'm really not sure. Does anyone know?

Well yes, when you compare a signed and an unsigned integer of equal size, the signed integer is cast to unsigned. It's defined by the standard. — Mysticial, Aug 28 '12 at 04:06
You'll often get a compiler warning for doing this ... What is the question exactly? — twsaef, Aug 28 '12 at 04:07
Here's a solution for decrementing unsigned index loops: [Unsigned integers in C++ for loops](http://stackoverflow.com/questions/9044059/unsigned-integers-in-c-for-loops) — Mysticial, Aug 28 '12 at 04:08
Thanks for the insight everyone. My post was less of a question and more to gain an understanding regarding what was taking place. Without having read the standard (I confess), I would have assumed everything is casted to a signed value as this makes sense to me logically. Looking at it from a computer architecture perspective it makes sense to cast to unsigned to preserve number range. — lvicks, Sep 04 '12 at 20:28

score 3 · Accepted Answer · answered Aug 28 '12 at 09:45

3

As others have pointed out, this is due to the somewhat counter-intuitive rules C++ applies when comparing values with different signedness; the standard requires the compiler to convert both values to unsigned. For this reason, it's generally considered best practice to avoid unsigned unless you're doing bit manipulations (where the actual numeric value is irrelevant). Regretfully, the standard containers don't follow this best practice.

If you somehow know that the size of the vector can never overflow int, then you can just cast the results of std::vector<>::size() to int and be done with it. This is not without danger, however; as Mark Twain said: "It's not what you don't know that kills you, it's what you know for sure that ain't true." If there are no validations when inserting into the vector, then a safer test would be:

while ( rebuildFaces.size() <= INT_MAX
        && rebuildIndex >= (int)rebuildFaces.size() )

Or if you really don't expect the case, and are prepared to abort if it occurs, design (or find) a checked_cast function, and use it.

answered Aug 28 '12 at 09:45

James Kanze

150,581
18
184
329

2

Or, an even better programming practice would be to do exactly the opposite: use *signed* types only when you really really really have to and stick to *unsigned* ones at all other times. – AnT stands with Russia Aug 28 '12 at 15:19
@AndreyT Except that given the way unsigned arithmetic works in C++, you almost always really, really have to use signed. The default type for integral values in C++ is `int`. That's the way the language was designed, and that's the way pretty much all experienced programmers use it. Any time you use a type other than `int` for an integral numeric value, you're screaming that some very unusual constraints are involved. – James Kanze Aug 28 '12 at 15:28
In my experience, it is not true. Any sort of combinatorial programming is done very naturally in unsigned types only. That's how I do it every day. In my field signed types exist primarily as geometric coordinates and that's it. – AnT stands with Russia Aug 28 '12 at 16:09
Calling the dominance of `int` it "language design" would be a stretch. It is simply a legacy feature inherited from the ancient versions of the language (like CRM), which simply had no unsigned types at all. And absence of unsigned types at that time was more of a design oversight, than a conscious design decision. (If we assume that there was any "design" involved at all.) – AnT stands with Russia Aug 28 '12 at 16:10
The feature that really defined the design of C and C++ languages, in my opinion, is how these languages treat *iterator ranges*. In C that would be pointer arithmetic in an array. The general principles of pointer arithmetic is aligned with unsigned arithmetic, which is why unsigned types should dominate C and C++ programs. – AnT stands with Russia Aug 28 '12 at 16:14
@AndreyT Unsigned types in C++ don't work very well for numeric values, and should be avoided. If `a` and `b` are `unsigned`, `a - b` may give surprising results. Comparing with a signed type (e.g. `0`) may also give surprising results. – James Kanze Aug 28 '12 at 16:35
@AndreyT The issue of iterator arithmetic is a good example of why the design of the STL should _not_ have used an unsigned type for `size`. Because pointer arithmetic, where it is legal, results in either a pointer or a signed type: `a - b`, if `a` and `b` are pointers or random access iterators, results in a `ptrdiff_t`, not a `size_t`. The general principals of pointer arithmetic are aligned with signed integral types: `a + (-1)` does _not_ add some horrendously big value to the pointer/iterator. – James Kanze Aug 28 '12 at 16:38
@AndreyT As for calling the dominence of `int` "langauge design", I'm not the one you did it: that's what Kernighan and Richie, and later Stroustrup have said. And still say: it's not an accident that Stroustrup barely mentions `unsigned` in _Programming---Principles and Practice Using C++_. – James Kanze Aug 28 '12 at 16:40
No, no, no. Unsigned types work exceptionally well for numeric values (although bundling everything into one category of "numeric values" makes little sense). If `a - b` gives "surprising results" because of the unsignednes of the types involved it might mean (and usually does) that the very presence of `a - b` in the code is an error already. The solution to that error is rethinking the code, not trying to sweep the problem under the carpet by switching to signed types. The same can be said about comparisons with signed types. The OP's code is a good example of that. – AnT stands with Russia Aug 28 '12 at 16:45
`0` is a bad example, since literal values are virtually typeless. The promotion in question makes sure of that. As for the history of `int` in language design, it is not really a matter of who said what. It is a matter of looking at the factual history of the language. – AnT stands with Russia Aug 28 '12 at 16:45
@AndreyT If the types are numeric, then `a - b` should be a well defined operation on them. If it's an error in the code, it's because you're not using the correct type. (Actually, this depends a bit on what you mean by "numeric". For things like serial numbers, whether you use signed or unsigned doesn't make much difference; for things like counters or indexes, on the other hand, using unsigned is really an error, since even if they can't be negative themselves, taking the distance, say with something like `abs( a - b )`, is perfectly reasonable. – James Kanze Aug 28 '12 at 16:54
@AndreyT And `0` very definitely has a type. `int`. At least according to the standard. – James Kanze Aug 28 '12 at 16:55
I'm not saying that `0` has some other type. `0` is indeed an `int`. I'm just saying that barring some borderline cases, the integral promotion and other conversion rules of the language make integral constants to behave in rather "typeless" way. It it is more or less true about all integral constants, but `0` is the most universal constant of them all. – AnT stands with Russia Aug 28 '12 at 17:01
The real problem is that integer types as a whole are broken in C++, which means there are problems with using signed integers, too. Now your algorithms have to account for negative values even if they don't make sense. On the other end, signed integers reduce your useful space by half when doing addition and multiplication. Signed integers are also slower to divide by a constant (to see how far you can take this optimization, see http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html). That's why I wrote the bounded::integer library: http://doublewise.net/c++/bounded/ – David Stone Oct 04 '14 at 15:39

BillRobertson42 · Answer 2 · 2012-08-28T05:36:25.740

1

On any modern computer that I can think of, signed integers are represented as two's complement. 32-bit int max is 0x7fffffff, and int min is 0x80000000, this makes adding easy when the value is negative. The system works so that 0xffffffff is -1, and adding one to that causes the bits to all roll over and equal zero. It's a very efficient thing to implement in hardware.

When the number is cast from a signed value to an unsigned value the bits stored in the register don't change. This makes a barely negative value like -1 into a huge unsigned number (unsigned max), and this would make that loop run for a long time if the code inside didn't do something that would crash the program by accessing memory it shouldn't.

Its all perfectly logical, just not necessarily the logic you expected.

Example...

$ cat foo.c
#include <stdio.h>

int main (int a, char** v) {
  unsigned int foo = 1;
  int bar = -1;

  if(foo < bar) printf("wat\n");
  return 0;
}

$ gcc -o foo foo.c
$ ./foo
wat
$

edited Aug 28 '12 at 05:36

answered Aug 28 '12 at 04:30

BillRobertson42

12,602
4
40
57

1

Whether the computer stores the value as two's complement is irrelevant. `static_cast(-1)` is the same value on a two's complement computer as a sign magnitude computer. See the answer to http://stackoverflow.com/questions/809227/is-it-safe-to-use-1-to-set-all-bits-to-true – David Stone Aug 28 '12 at 04:42
But that's not what his code does. His code has an implicit cast, and that's the way those work. – BillRobertson42 Aug 28 '12 at 04:52
Do you have a citation in the standard for that? My understanding was that an implicit cast is the same as a `static_cast` in pretty much every situation, including this one. – David Stone Aug 28 '12 at 05:01
I do not, but the standard doesn't compile my code either. I've attached an example. If you think it works differently, I would like to know how. Thanks. – BillRobertson42 Aug 28 '12 at 05:36
I'm not saying that you are wrong on a two's-complement architecture. I'm saying that the architecture doesn't matter. Your test would pass on one's complement or sign-magnitude, as well, so it doesn't distinguish between our views. Comparison between `int` and `unsigned` promotes the `int` to `unsigned`, which is the same as performing `static_cast(value_of_int)`. – David Stone Sep 01 '12 at 00:24
This test will pass on any conforming implementation, regardless of whether it is two's complement (assuming 32-bit `int`, however): `int const cool = -1; unsigned const alright = 0xFFFFFFFF; assert(cool == alright);` – David Stone Sep 01 '12 at 00:25
What compilers conform? I've never used one. – BillRobertson42 Sep 01 '12 at 01:31
g++ and clang++, to name the only two I tested on my two's complement machine. I don't have access to any other type to test, but it should always pass. – David Stone Sep 01 '12 at 18:47
The important point is that the bits stored in the register / in memory actually can change. It's just that in two's complement, they happen to not change ever. In other systems of storing numbers, however, they will change. – David Stone Sep 01 '12 at 18:48
What specific architectures are you referring to? – BillRobertson42 Sep 02 '12 at 15:51
All of them. Two's complement stores -1 as all bits on, which when converted to `unsigned` (again, assuming 32 bit) is 0xFFFFFFFF, which has all bits on. Sign-magnitude stores -1 as the highest bit (the sign bit) on as well as the lowest bit on. When you convert -1 to `unsigned` in sign-magnitude, the bits do change because the result of the conversion is still the same, 0xFFFFFFFF (or all bits on). Similar reasoning applies to one's complement. If I invent my own architecture that has a different encoding for integers, then -1 on that platform is still 0xFFFFFFFF when converted to `unsigned`. – David Stone Sep 04 '12 at 00:23

score 1 · Answer 3 · answered Aug 28 '12 at 05:45

1

In C and C++ languages when unsigned type has the same or greater width than signed type, mixed signed/unsigned comparisons are performed in the domain of unsigned type. The singed value is implicitly converted to unsigned type. There's nothing about the "compiler" doing anything "blindly" here. It was like that in C and C++ since the beginning of times.

This is what happens in your example. Your rebuildIndex is implicitly converted to vector<cv::Mat>::size_type. I.e. this

rebuildIndex >= rebuiltFaces.size()

is actually interpreted as

(vector<cv::Mat>::size_type) rebuildIndex >= rebuiltFaces.size()

When signed value are converted to unsigned type, the conversion is performed in accordance with the rules of modulo arithmetic, which is a well-known fundamental principle behind unsigned arithmetic in C and C++.

Again, all this is required by the language, it has absolutely nothing to do with how numbers are represented in the machine etc and which bits are stored where.

answered Aug 28 '12 at 05:45

AnT stands with Russia

312,472
42
525
765

1

It wasn't like that from the beginning of time, and I can remember using C compilers that converted `unsigned` to `int` when comparing the two. Which way it went tended to vary until ANSI standardized C in 1989. – James Kanze Aug 28 '12 at 09:16
@James Kanze: Well, it is still a matter of figuring out whether that was the intent of that compiler, just an unplanned side-effect of the underlying hardware architecture or even a plain bug. As for the language itself, one pre-standard document I see is "C Reference Manual", and it doesn't mention any unsigned types at a all. I wonder what old versions of K&R said about it, if anything... – AnT stands with Russia Aug 28 '12 at 15:15
I'm not sure, but I believe it was the intent of K&R. I know that there was a great deal of discussion about it when C was standardized, because there is no "correct" solution (other than banning such comparisons completely). What is certain is that during standardization, different vendors argued for what their compilers did, as if they'd done it intentionally. Whether it was actually intentional, or just defending their version of the status quo, I don't know. – James Kanze Aug 28 '12 at 15:25

score 0 · Answer 4 · answered Aug 28 '12 at 04:38

Regardless of the underlying representation (two's complement being the most popular, but one's complement and sign magnitude are others), if you cast -1 to an unsigned type, you will get the largest number that can be represented in that type.

The reason is that unsigned 'overflow' behavior is strictly defined as converting the value to the number between 0 and the maximum value of that type by way of modulo arithmetic. Essentially, if the value is larger than the largest value, you repeatedly subtract the maximum value until your value is in range. If your value is smaller than the smallest value (0), you repeatedly add the largest value until it's in range. So if we assume a 32-bit size_t, you start with -1, which is less than 0. Therefore, you add 2^32, giving you 2^32 - 1, which is in range, so that's your final value.

Roughly speaking, C++ defines promotion rules like this: any type of char or short is first promoted to int, regardless of signedness. Smaller types in a comparison are promoted up to the larger type in the comparison. If two types are the same size, but one is signed and one is unsigned, then the signed type is converted to unsigned. What is happening here is that your rebuildIndex is being converted up to the unsigned size_t. 1 is converted to 1u, 0 is converted to 0u, and -1 is converted to -1u, which when cast to an unsigned type is the largest value of type size_t.

c++ illogical >= comparison when dealing with vector.size() most likely due to size_type being unsigned

4 Answers4

Linked