8

Often an object I use will have (signed) int parameters (e.g. int iSize) which eventually store how large something should be. At the same time, I will often initialize them to -1 to signify that the object (etc) hasn't been setup / hasn't been filled / isn't ready for use.

I often end up with the warning comparison between signed and unsigned integer, when I do something like if( iSize >= someVector.size() ) { ... }.

Thus, I nominally don't want to be using an unsigned int. Are there any situations where this will lead to an error or unexpected behavior?

If not: what is the best way to handle this? If I use the compiler flag -Wno-sign-compare I could (hypothetically) miss a situation in which I should be using an unsigned int (or something like that). So should I just use a cast when comparing with an unsigned int--e.g. if( iSize >= (int)someVector.size() ) { ... } ?

DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
  • 1
    [Absolutely](http://stackoverflow.com/questions/13600991/why-is-a-negative-int-greater-than-unsigned-int) it can result it unexpected (but defined) results. – WhozCraig Jan 27 '13 at 22:35

1 Answers1

5

Yes, there are, and very subtle ones. If you are curious, you can check this interesting presentation by Stephan T. Lavavej about arithmetic conversion and a bug in Microsoft's implementation of STL which was caused just by signed vs unsigned comparison.

In general, the problem is due to the fact that because of complement 2 arithmetic, a very small negative integral value has the same bit representation as a very big unsigned integral value (e.g. -1 = 0xFFFF = 65535).

In the specific case of checking size(), why not using type size_t for iSize in the first place? Unsigned values just give you greater expressivity, use it.

And if you do not want to declare iSize as size_t, just make it clear by using an explicit cast that you are aware of the nature of this comparison. The compiler is trying to do you a favor with those warnings and, as you correctly wrote, there might be situations where ignoring them would cause you a very bad headache.

Thus, if iSize is sometimes negative (and should be evaluated as less than all unsigned int values of size()), use the idiom: if ((iSize < 0) || ((unsigned)iSize < somevector.size())) ...

Andy Prowl
  • 124,023
  • 23
  • 387
  • 451
  • As I said, I use `iSize` for other purposes (with negative values). Thus it seems I explicitly don't want to cast it as an unsigned int (won't `(unsigned int)-1 be > someVector.size() == true` ?). That's why I explicitly asked if I should cast the `.size()` into a signed `int` --- are you saying that's the appropriate solution? – DilithiumMatrix Jan 27 '13 at 22:43
  • @zhermes: Yes, the appropriate solution is to cast the result of `size()` to a `signed int`: `if (iSize < (int)v.size()) ...` – Andy Prowl Jan 27 '13 at 22:45
  • 1
    @AndyProwl It's almost always appropriate, as you generally don't want to use `unsigned` except in special cases. Depending on the platform (or to ensure portability later), you may want to write and use `checked_cast`, which will verify that the value you convert is in bounds. – James Kanze Jan 27 '13 at 22:47
  • @JamesKanze: I think it is arguable that "in general you don't want to use `unsigned`". Why not? If something logically shouldn't be negative (and that happens many times), why allowing it to be so? Also, unsigned integers give you a wider range of values, again under the assumption that a certain variable logically cannot assume negative values (and that's always the case for sizes). I personally dislike the trick of setting an index to -1 for indicating "not found", or "not initialized", etc. A separate variable should be used for that purpose. – Andy Prowl Jan 27 '13 at 22:51
  • 1
    @zhermes: are you "absolutely" certain that casting the result of size() (which is probably `unsigned long`) to an `signed int` will not result in a negative number? How certain? Was that -1% certain? :) – rici Jan 27 '13 at 22:59
  • @AndyProwl There would certainly be an argument in favor of using a cardinal type, or some other subrange type, if C++ supported it. But `unsigned` in C++ has very particular semantics, which mean that it should be avoided in any context where arithmetic involved. – James Kanze Jan 28 '13 at 08:45
  • @rici It depends on context. On many systems, you are guaranteed that the results of `size()` will fit in an `int`. On others, you aren't; that's why I suggested using a checked comparison. (Another alternative would be to use `ptrdiff_t`. This will extend the guarantee to practically all cases.) – James Kanze Jan 28 '13 at 08:48
  • @AndyProwl "If something logically shouldn't be negative, why allow it to be so?" And if something can't be larger than 3, why allow it to be so? C++ doesn't have subrange types. And the way C++ defines arthimetic on unsigned means that it doesn't work in arithmetic expressions. (See http://aristeia/Papers/C++ReportColumns/sep95.pdf, for example.) – James Kanze Jan 28 '13 at 08:59
  • @JamesKanze: nice paper, thank you for sharing it. Now with all the respect for SM, I think the paper (apart from being a bit outdated) makes strange assumptions: *"In particular, you have to assume they’ll write code like this: ... `int arraySize = a.length()`"*. That's like saying "Give up designing your class right, because if clients will use it wrong, they will have a headache". Well, that's because they are using it wrong. Arithmetic expressions comparing signed to unsigned are bad, indeed. This doesn't mean you shouldn't use unsigned, just that you shouldn't compare them to signed. – Andy Prowl Jan 28 '13 at 11:56
  • @JamesKanze: concerning "And if something can't be larger than 3, why allow it to be so?". Because, as you said, C++ doesn't have ranges, so you just have to allow them to be so. If C++ had ranges, I'd certainly use them. With care when comparing values of different ranges, just as when you use enums. And I definitely believe that signed to unsigned comparisons are a bad heritage of C and C++ should deprecate them, but well, they seem to care a lot about backwards compatibility (maybe for good reasons, I can't judge). – Andy Prowl Jan 28 '13 at 12:00
  • @JamesKanze: A few months after Meyers wrote that paper, he wrote this http://www.aristeia.com/Papers/C++ReportColumns/feb96.pdf (in which he says "After all was said and done, I concluded that I was indeed a dufus, but at least I wasn’t an obvious dufus."). There are pros and cons, but the interface issue is definitively settled by the standard library, in which the `size` interface returns an unsigned integer. Thus, imho, the safest, simplest, and least mental effort approach is to avoid comparing `size()` with signed values. If you need an "unset" indicator, use a `bool` or an `enum`. – rici Jan 28 '13 at 15:14
  • 2
    @AndyProwl: does that lead you to re-evaluate the wisdom of "use the cast: if( iSize >= (int)someVector.size() )" ? I'd say that the correct expression, given your hypothesis "if iSize is sometimes negative (and should be evaluated as less than all unsigned int values of size())", would be `if (iSize < 0 || (unsigned)iSize < somevector.size())`, which is a precise statement of the stated goal. I stand by my claim that overloading iSize with an "unset" indicator is suboptimal. – rici Jan 28 '13 at 15:34
  • @rici: my preferred option is to have `iSize` unsigned, and use a separate flag to indicate "not found", "not initialized", etc. But yes, if that's not done, then what you propose is the right solution. – Andy Prowl Jan 28 '13 at 15:40
  • @AndyProwl. Mine, too. But the sentence I quoted was yours, and it was added to the answer, so you might want to consider an edit. – rici Jan 28 '13 at 16:13
  • @AndyProwl If C++ had ranges, I'd use them. And if `unsigned` actually worked like a real cardinal, I'd use it as well. But as it is... – James Kanze Jan 28 '13 at 16:22
  • @JamesKanze: please see the [paper by Scott Meyers](http://aristeia.com/Papers/C++ReportColumns/feb96.pdf) that was originally linked by rici. – Andy Prowl Jan 28 '13 at 16:23
  • @rici: anyway, feel free to edit any of my answers if you find mistakes :-) I do not mind. – Andy Prowl Jan 28 '13 at 16:24
  • @JamesKanze: I'd like to see some effort at standardizing a dialect of C which allowed code to specify numeric types with a level of specificity matching requirements (e.g. allowing code to specify big-endian, little-endian, or "whatever's convenient", distinguish cardinal numbers from alegbraic rings, etc.), with the goal being that nearly all well-defined code which compiles under both the current dialect and the new one would behave identically, and that most code which compiles under the current dialect but not the new one could be easily adapted to compile identically in both dialects. – supercat Apr 03 '14 at 18:36
  • @JamesKanze: I can't really see any way `int` can ever expand beyond 32 bits without a means of specifying a type which behaves as an algebraic ring of integers that are congruent mod 2^32 [independent of the size of `int`], and having separate types for algebraic rings, cardinals, and "legacy unsigned integers" would allow one to define rules for rings and cardinals based upon what's appropriate for those types, rather than composing ad hoc rules based upon what compilers seem to be doing (which I think is how the original signed/unsigned rules came about). – supercat Apr 03 '14 at 18:48
  • None of the comments in favor of using unsigned types really address the most fundamental issues. If I want to know the difference between two sizes, then `abs( size1 - size2 )` should give it. If it doesn't, the type is not appropriate. The fact remains that C++ unsigned integral types do _not_ behave as one would expect an integral numeric type to behave, and should not be used when the variable contains numerical data. The fact that the standard containers return and unsigned type from `size()` is a historical anomaly. (Logically, `size()` should return `end() - begin()`.) – James Kanze Apr 04 '14 at 08:27
  • @supercat There are machines today where `int` is larger than 32 bits. I don't see where it is a problem. Code which depends on `int` wrapping is seriously broken, because overflowing an `int` is undefined behavior. And for what it's worth, the "original" signed/unsigned rules are not the ones the standard specifies today. Back when the C standard was being written, it was recognized that something had to give, that there wasn't a good solution, and so an arbitrary one was chosen. – James Kanze Apr 04 '14 at 08:31
  • @JamesKanze: The issue isn't code that depends on `int` wrapping, but code that depends upon `uint32_t` behaving as an algebraic ring under addition and multiplication. As the standards are written, signed `intNN_t` types which are no more than half as big as `int` behave as an algebraic ring if intermediate results are cast back to that size, but unsigned types which are between half the size of `int` and the full size of `int` do not. I don't know of any modern platforms where requiring all unsigned types to behave as rings would impede *useful* optimizations, but the standard doesn't... – supercat May 14 '15 at 20:03
  • ...require it. (*) I would consider an "optimization" useless if most cases where it's applicable would involve not-strictly-conforming code which would in practice have worked in the absence of the "optimization" but are broken by it, and which if fixed by adding ugly type hacks to force strict conformance would result in the code being no more efficient than if the optimization had never been attempted. – supercat May 14 '15 at 20:08