24

Should one ever declare a variable as an unsigned int if they don't require the extra range of values? For example, when declaring the variable in a for loop, if you know it's not going to be negative, does it matter? Is one faster than the other? Is it bad to declare an unsigned int just as unsigned in C++?

To reitterate, should it be done even if the extra range is not required? I heard they should be avoided because they cause confusion (IIRC that's why Java doesn't have them).

Celeritas
  • 14,489
  • 36
  • 113
  • 194
  • 2
    What's the use of doing *anything* when not necessary? – Bo Persson Sep 01 '12 at 08:51
  • @BoPersson: If someone asks "pizza or hamburger?", he probably isn't expecting a "yes" or "no" answer based on whether either one is "necessary". Similar situation here. :) – user541686 Sep 01 '12 at 09:13
  • 1
    I see this more as "should I eat if I'm not hungry?". But we might read the question differently. – Bo Persson Sep 01 '12 at 09:16
  • 2
    I find it interesting that this other question is essentially a duplicate, but comes to the opposite conclusion: http://stackoverflow.com/questions/22587451/c-c-use-of-int-or-unsigned-int – peterpi Apr 21 '17 at 09:08
  • Another related question https://softwareengineering.stackexchange.com/q/97541/105228 – Alex Che Nov 23 '22 at 11:07

7 Answers7

14

The reason to use uints is that it gives the compiler a wider variety of optimizations. For example, it may replace an instance of 'abs(x)' with 'x' if it knows that x is positive. It also opens up a variety of bitwise 'strength reductions' that only work for positive numbers. If you always mult/divide an int by a power of two, then the compiler may replace the operation with a bit shift (ie x*8 == x<<3) which tends to perform much faster. Unfortunately, this relation only holds if 'x' is positive because negative numbers are encoded in a way that precludes this. With ints, the compiler may apply this trick if it can prove that the value is always positive (or can be modified earlier in the code to be so). In the case of uints, this attribute is trivial to prove, which greatly increases the odds of it being applied.

Another example might be the equation y = 16 * x + 12. If x can be negative, then a multiply and add would be required. Yet if x is always positive, then not only can the x*16 term be replaced with x<<4, but since the term would always end with four zeros this opens up replacing the '+ 12' with a binary OR (as long as the '12' term is less than 16). The result would be y = (x<<4) | 12.

In general, the 'unsigned' qualifier gives the compiler more information about the variable, which in turn allows it to squeeze in more optimizations.

Ghost2
  • 536
  • 3
  • 13
  • 1
    This, while unfortunately making perfect sense, is wrong. `x * 8` and `x << 3` are interchangable even for negative signed integers, on most systems. And the compiler knows, and will use shifts. I think you meant binary OR, by the way: `(x << 4) | 12`, which is *also* valid on most systems for signed integers. –  Sep 01 '12 at 08:53
  • Just noticed the OR issue, fixed. I'm not too sure about the shift though, unless the << op is designed to handle ints. It should work for positive ints (as long as you don't overflow), but I reckon that negatives should fail. For example, -72 in a signed byte is "10111000". If you were to shift it right by 3 bits (eg /8), you get "00010111" which is +23, not the -9 (11110111) you want. – Ghost2 Sep 01 '12 at 23:17
  • 1
    That's about right shifts, where the sign bit needs to be preserved when you want to use it to divide. x86 assembly has the `sar` (shift arithmetic right) instruction for that, which behaves differently from the `shr` (shift right) instruction. That issue does not exist for left shifts. Besides, rounding complicates matters for divisions, but rounding issues don't exist for multiplications / left shifts either. –  Sep 02 '12 at 17:18
  • "The reason" - as though there's a _single_ reason? No. -1. – einpoklum May 12 '19 at 20:04
  • The primary concern of a programmer should be correct code, optimizations should only be pursued if determined necessary. Use of unsigned can introduce unintended errors. – Glen Yates Aug 17 '21 at 19:54
12

You should use unsigned integers when it doesn't make sense for them to have negative values. This is completely independent of the range issue. So yes, you should use unsigned integer types even if the extra range is not required, and no, you shouldn't use unsigned ints (or anything else) if not necessary, but you need to revise your definition of what is necessary.

juanchopanza
  • 223,364
  • 34
  • 402
  • 480
  • @Mehrdad except that you may have such large values that you may actually need the largest unsigned integer value. –  Sep 01 '12 at 06:49
  • @H2CO3, surely that would be a case where it _makes_ "sense for them to have negative values", yes? – paxdiablo Sep 01 '12 at 06:51
  • 2
    @H2CO3: That's what exceptions are for. Magic values are bad. – Puppy Sep 01 '12 at 06:52
  • @Mehrdad you know e. g. `int32_t` can hold a greater value than `uint16_t`? –  Sep 01 '12 at 06:52
  • 1
    @H2CO3: Yeah, so can `uint32_t`. But I wasn't recommending that in every situation; I was referring to the typical situations. As a rule of thumb, if the unsigned data type is *smaller* than an `int`, then it might make sense, in terms of speed and ease of use. But my `npos` example isn't normally like that; it wouldn't be a good idea to use `long long` instead of `size_t` just because it works. Just use the max value if at all possible. If you can't, well, then you can't. – user541686 Sep 01 '12 at 06:54
  • @H2CO3 I think allowing an integer type to be negative when it doesn't make sense just to signal out-of-range values can bring many more problems whan using `npos`, which is the lesser of two evils. – juanchopanza Sep 01 '12 at 07:26
  • 4
    *"You should use unsigned integers when it doesn't make sense for them to have negative values."* Why? You could just as easily make this argument *"You should use signed integers if it doesn't make sense for them to have values greater than 32767."* – Benjamin Lindley Sep 01 '12 at 07:32
  • 1
    @BenjaminLindley I don't agree. My statement is more fundamental and completely independent of size. If you are using an 8 bit integer for indexing somethig, where a negative value is a logic error, then why use a signed type? – juanchopanza Sep 01 '12 at 07:38
  • 1
    @BenjaminLindley: What's so special to C about 32767? Unsigned types are **always** unsigned on **every** system. Signed integers have nothing to do with 32767 except for *very specific* data types on *some* systems. Your comparison makes no sense. – user541686 Sep 01 '12 at 08:56
  • @Mehrdad: The range of `int` is defined to be at least [-32767,32767]. – Benjamin Lindley Sep 01 '12 at 09:00
  • 2
    @BenjaminLindley: Yeah, so? The range of `signed char` is at least -128 to 127, and the range of `long` is at least -2,147,483,647 to 2,147,483,647. What does any of that have to do with "using signed integers if it doesn't make sense for them to have values greater than 32767"? – user541686 Sep 01 '12 at 09:32
  • @Mehrdad: I assumed we were talking about `int` and `unsigned int`, since that is what the OP is asking about. juanchopanza switched it to talk about integer types in general, and I didn't notice, mainly because I don't think there's much point to using other non specific integer types such as `long`. `char` should be used, but not as an integer (even if you only need one byte). – Benjamin Lindley Sep 01 '12 at 10:06
  • 3
    @BenjaminLindley maybe I shouldn't have diverted the discussion to other integer types, but my point is that OP's definition of "necessary" seems to be solely related to range, whereas I think one should consider program logic first. – juanchopanza Sep 01 '12 at 10:26
  • 4
    @BenjaminLindely: If I have a program that counts the number of characters read, can this be negative? Can I have a negative quantity of apples? I generally use `unsigned` for quantities and sizes. I've never come across a negative size of a physical thing. For example, can a tree have a negative circumference? I'm letting the compiler help in defect identification or prevention. – Thomas Matthews Sep 10 '12 at 19:44
  • @ThomasMatthews A counterargument is that you might instead be letting the compiler help to hide errors introduced by negative values, by silently converting them to extremely large positive values, and losing your ability to safely detect the negative value at its source. – underscore_d Nov 24 '18 at 15:30
  • 1
    Bjarne Stroustrup would disagree, "Choosing unsigned implies many changes to the usual behavior of integers, including modulo arithmetic, can suppress warnings related to overflow, and opens the door for errors related to signed/unsigned mixes. Using unsigned doesn't actually eliminate the possibility of negative values." – Glen Yates Aug 17 '21 at 19:57
  • 1
    @GlenYates Since posting this answer, I have changed my thinking, for many reasons. One thing that could still be an issue when using signed integers to count things (in situations where a negative value could be used to signal an error) is signed integer overflow and UB. – juanchopanza Aug 21 '21 at 07:42
8

More often than not, you should use unsigned integers.

They are more predictable in terms of undefined behavior on overflow and such.
This is a huge subject of its own, so I won't say much more about it.
It's a very good reason to avoid signed integers unless you actually need signed values.

Also, they are easier to work with when range-checking -- you don't have to check for negative values.

Typical rules of thumb:

  • If you are writing a forward for loop with an index as the control variable, you almost always want unsigned integers. In fact, you almost always want size_t.

  • If you're writing a reverse for loop with an index as a the control variable, you should probably use signed integers, for obvious reasons. Probably ptrdiff_t would do.

The one thing to be careful with is when casting between signed and unsigned values of different sizes.
You probably want to double-check (or triple-check) to make sure the cast is working the way you expect.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    *"This is a huge subject of its own, so I won't say much more about it."* -- Please, at least say *something* about it, because I have no idea what you are talking about. – Benjamin Lindley Sep 01 '12 at 07:23
  • @BenjaminLindley: See [*this*](http://stackoverflow.com/a/247940/541686) (it's mostly correct) and [*this*](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html) (it's completely correct; search for "Signed integer overflow"). – user541686 Sep 01 '12 at 07:24
  • But if, as the OP said, "the extra range is not required", overflow is not possible. If it happens by accident, how does the fact that the behavior is defined help? – Benjamin Lindley Sep 01 '12 at 07:27
  • 1
    Coz a defined overflow can not cause demons to fly out of your nose. (http://catb.org/jargon/html/N/nasal-demons.html) – Michael Anderson Sep 01 '12 at 07:37
  • 1
    @BenjaminLindley: It doesn't matter if you need the range for your data itself. Chances are, you don't. But what matters is whether you might need the range for your intermediate calculations. Chances are, you very well might. If he doesn't, well, then the point is moot (I didn't even see that edit when I wrote this). You might find [this read](http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html) interesting, though. – user541686 Sep 01 '12 at 07:38
  • 2
    @Mehrdad: If you *might* need larger integers, then you should use a larger integer type, not allow your integers to simply overflow. – Benjamin Lindley Sep 01 '12 at 08:17
  • 1
    @Mehrdad: I read the link. It's still correct to use larger integers and incorrect to allow your integers to overflow, even if the latter works okay for *some* situations. – Benjamin Lindley Sep 01 '12 at 08:37
  • 1
    "If you're writing a reverse for loop..." I've always used unsigned integers in this case too: `for(size_t i = v.size(); i--; ) { ... }` – Yakov Galka Sep 01 '12 at 09:01
  • @Mehrdad: No, not obvious at all. I've never heard those two numbers used interchangeably. I thought we were talking about overflow though? The reason that averaging 2^20 2048s comes out correct when using 32 bit unsigned ints is because no overflow occurs. So I'm not sure what your point is. Please clarify. – Benjamin Lindley Sep 01 '12 at 09:14
  • @BenjaminLindley: Ah, yes, I see where the confusion is. Going back to your first comment -- you were referring to *"the extra range"*. What I was showing you with my example was that even if you don't need the "extra range" for your *data*, you might still need it for your intermediate operations, and it can cause overflows in signed ints, which is not only wrong but also happen to be undefined (even worse). I think the two issues (overflowing the non-"extra" range versus overflowing the data type itself) got confusing here... I was making points about them separately but it got confusing. – user541686 Sep 01 '12 at 09:26
  • In cases where one can actually benefit from defined wrapping behavior, it's fine to use unsigned types. In cases where computed values will never wrap, however, being able to assume that values won't go out of range may allow a compiler to make optimizations it otherwise could not. As a simple example, when compiling for an ARM, given `short x; unsigned short y;`, if `x` and `y` both get optimized to registers R4 and R5, the statement `x++` would be one instruction (`add r4,r4,#1`) but `y++` would be two (add r5,r5,#1 / bic r5,r5,#0x10000). – supercat Dec 20 '13 at 21:35
  • @Mehrdad: If I were designing a language, I'd include both signed and unsigned types which represented abstract algebraic groups with defined wrapping behavior, and both signed and unsigned "checked number" types where overflow would be defined to trap, and "quick" signed and unsigned numeric types where overflow would be Undefined Behavior. All three cases are useful with signed numbers and with unsigned numbers, so IMHO a good languages should support all combinations. – supercat Dec 20 '13 at 22:00
  • @supercat: Yup I agree, default should be checked but there should also be unchecked types. – user541686 Dec 20 '13 at 22:55
  • 1
    "_you don't have to check for negative values._" What you mean is that you **can't** check for them, because the compiler will silently convert them into extremely large positive values, and you get lulled into a false sense of security thinking you've removed one source of error when really you've just hidden it and replaced it with another one, where errors that could previously be checked and caught at source get transmuted into subtle bugs instead. It's dangerous to think this buys you much (any?) safety. – underscore_d Nov 24 '18 at 15:32
  • @Mehrdad: Sorry, I mis-spoke. signed ints will be two's-complement, but overflow will still be undefined behavior. See [this](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r0.html). – einpoklum May 14 '19 at 09:23
8

int is the general purpose integer type. If you need an integer, and int meets your requirements (range [-32767,32767]), then use it.

If you have more specialized purposes, then you can choose something else. If you need an index into an array, then use size_t. If you need an index into a vector, then use std::vector<T>::size_type. If you need specific sizes, then pick something from <cstdint>. If you need something larger than 64 bits, then find a library like gmp.

I can't think of any good reasons to use unsigned int. At least, not directly (size_t and some of the specifically sized types from <cstdint> may be typedefs of unsigned int).

Benjamin Lindley
  • 101,917
  • 9
  • 204
  • 274
7

The problem with the systematic use of unsigned when values can't be negative isn't that Java doesn't have unsigned, it is that expressions with unsigned values, especially when mixed with signed one, give sometimes confusing results if you think about unsigned as an integer type with a shifted range. Unsigned is a modular type, not a restriction of integers to positive or zero.

Thus the traditional view is that unsigned should be used when you need a modular type or for bitwise manipulation. That view is implicit in K&R — look how int and unsigned are used —, and more explicit in TC++PL (2nd edition, p. 50):

The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
2

In almost all architectures the cost of signed operation and unsigned operation is the same. So efficiency wise you wont get any advantage for using unsigned over signed. But as you pointed out, if you use unsigned you will have a bigger range

Celeritas
  • 14,489
  • 36
  • 113
  • 194
knightrider
  • 2,063
  • 1
  • 16
  • 29
  • "If you use unsigned you'll have a bigger range" - that's only true for integers with the same length. A signed long can hold a greater value than an unsigned byte. –  Sep 01 '12 at 06:55
  • 1
    @H2CO3 of course the base type should be same ( comparison is valid only between int and int, long and long, ...) – knightrider Sep 01 '12 at 06:58
  • 1
    I think you'll find the range (max-min) is exactly the same between signed and unsigned forms. (at least for twos complement forms). – Michael Anderson Sep 01 '12 at 07:10
  • @MichaelAnderson if you mean the total number of points, yes you are right. – knightrider Sep 01 '12 at 07:12
2

Even if you have variables that should only take non negative values unsigned can be a problem. Here is an example. Suppose a programmer is asked to write a code to print all pairs of integer numbers (a,b) with 0 <= a < b <= n where n is a given input. An incorrect code is

for (unsigned b = 0; b <= n; b++)
   for (unsigned a=0; a <=b-1; b++)
       cout << a << ',' << b << n ;

This is easy to correct, but thinking with unsigned is a bit less natural than thinking with int.

Pierre
  • 21
  • 1
  • I presume you mean `b-1` will be `UINT_MAX` on the first iteration, thus something very different from intended. But the code is wrong for other reasons. Why would you _think_ `a < b` but then write `a <=b-1`? you introduce an error by using a pointlessly cryptic expression; the former would work. And why increment `b` in the inner loop, instead of `a`? (and why use postincrement... I guess because all 2-bit tutorials still do, sadly) – underscore_d Nov 24 '18 at 15:39