72

I'd like to understand better why choose int over unsigned?

Personally, I've never liked signed values unless there is a valid reason for them. e.g. count of items in an array, or length of a string, or size of memory block, etc., so often these things cannot possibly be negative. Such a value has no possible meaning. Why prefer int when it is misleading in all such cases?

I ask this because both Bjarne Stroustrup and Chandler Carruth gave the advice to prefer int over unsigned here (approx 12:30').

I can see the argument for using int over short or long - int is the "most natural" data width for the target machine architecture.

But signed over unsigned has always annoyed me. Are signed values genuinely faster on typical modern CPU architectures? What makes them better?

Rapptz
  • 20,807
  • 5
  • 72
  • 86
Mordachai
  • 9,412
  • 6
  • 60
  • 112
  • 14
    For the benefit of the readers who can't watch the 1 hour video now: What do Stroustrup and Carruth say about why they prefer signed? – us2012 Sep 13 '13 at 21:22
  • I was just going to add that! Around 12:00 in. – Mordachai Sep 13 '13 at 21:23
  • 1
    For me, the reason is that unsigned is large enough for most cases, and I don't want to worry about integer overflow errors unless I have to. – Tharwen Sep 13 '13 at 21:24
  • 1
    I am guessing it is more about personal style than anything. There are probably lots of instances where int could be replaced by unsigned.. In most of those instances it wouldn't make any difference to it is just down to personal style. And ... int is shorter to type than unsigned. – Rob Sep 13 '13 at 21:26
  • 12
    I prefer `int` instead of `unsigned` because: 1. it's shorter (I'm serious!), 2. it's more generic and more intuitive (i. e. I like to be able to assume that `1 - 2` is -1 and not some obscure huge number), 3. what if I want to signal an error by returning an out-of-range value? –  Sep 13 '13 at 21:27
  • @H2CO3 answer the question then – aaronman Sep 13 '13 at 21:28
  • 1
    I've seen some argument about potential bug like `for(unsigned x = ...; x >= 0; x--)`. I wouldn't consider it valid. – zch Sep 13 '13 at 21:28
  • An integer overflow with a signed usually looks more obvious at first glance than one with an unsigned and has a greater likeliness of creating a crash than a silent bug. – Philipp Sep 13 '13 at 21:28
  • Stroustrup says he prefers to use int even for naturally unsigned qualities! I've been ridiculed on forums for saying exactly the same thing. unsigned should be removed from the C++ language. This is one thing Java gets right. – john Sep 13 '13 at 21:29
  • 6
    At 11:08: "there's no simple guidance that can be given" – Robᵩ Sep 13 '13 at 21:29
  • @Robᵩ but then the fellow further down the line does give simple guidelines. – ChiefTwoPencils Sep 13 '13 at 21:30
  • 3
    @john Ok, but then they at least need to make `int` overflow defined behaviour and make twos-complement mandatory. Otherwise `int` would be plain unusable for many reasonable tasks for which there's no way around `unsigned int`s, as it is now. – Christian Rau Sep 13 '13 at 21:32
  • @ChristianRau Those would both be improvements to C++ too. – john Sep 13 '13 at 21:40
  • 2
    @john: Preferring `int` to `unsigned` for most purposes is one thing; removing `unsigned` from the language is far more drastic. – Keith Thompson Sep 13 '13 at 21:49
  • 2
    @ChristianRau: A machine which has 64-bit registers may store an `int32` in a register, and increment it just be incrementing that register, precisely *because* out-of-bound signed arithmetic is Undefined Behavior. By contrast, incrementing a register-stored `uint32` would require masking the value with 0xFFFFFFFF after the increment--an extra step. – supercat Sep 13 '13 at 21:49
  • @ChristianRau: Making two's-complement mandatory would make it difficult to have a conforming C++ implementation on non-two's-complement hardware, and perhaps impossible to have an *efficient* implementation. Such hardware is rare and probably getting rarer. It's an interesting question whether it's rare enough (yet) to justify such a change. – Keith Thompson Sep 13 '13 at 21:51
  • @KeithThompson Of course I realize that in the real world it's a ridiculous thing to suggest. But if we were starting over I think it would have be better for unsigned to be left out of C and C++. – john Sep 13 '13 at 21:52
  • 2
    @john That's absurd (even if you acknowledge it is absurd given history), it is even absurd in the first place. C/C++ allows close to the metal programming - including being able to inter-operate with register based devices on the bus, and do flag-manipulation, etc. Unsigned avoids the sign-carry problems of signed values. Which for me, are far more common than being "surprised" by underflow. – Mordachai Sep 13 '13 at 21:56
  • 1
    @john: Early C (several years before K&R1) actually didn't have `unsigned`. Programmers resorted to treating pointers as unsigned integers (the language was much less strictly typed than it is now). – Keith Thompson Sep 13 '13 at 21:56
  • @KeithThompson To be honest I was more making a case for `unsigned`s than one for twos-complement by standard. While I for myself s**t on non twos-complement hardware, I also know that this is just my opinion. – Christian Rau Sep 13 '13 at 23:00
  • @john Well, if starting over we were better off dropping a bunch of things (first of all C compatibility), but definitely not unsigned types. But ok, I also agree this might be subjective. For me someone that cannot understand the oh so unexpected underflow "issues" of unsigned types, those are probably not the least "problems" with the language, but this maybe just a matter of taste. – Christian Rau Sep 13 '13 at 23:05
  • I seem to recall that, at least on some platforms, instructions dealing with unsigned values can, in certain situations, be marginally slower than their signed counterparts, due to the different requirements for setting certain flags, generating potential traps, etc... Without digging out the Intel/AMD/SPARC/etc. manuals, though, I might be a bit off base... – twalberg Sep 19 '13 at 20:35
  • 1
    @ChristianRau: If a new language were being designed for the same purposes as C, I would suggest that it should have separate "natural number" and "algebraic ring" types, with identical representations but different promotion and implicit conversion rules. Adding 8-bit natural number "5" to 32-bit integer 256001 should yield a 32-bit integer 256006. Adding 8-bit ring member "5" to 32-bit integer 256001 should yield 8-bit ring member "6" (since (256001+5) mod 256 is 6). – supercat Aug 03 '15 at 21:50

13 Answers13

40

As per requests in comments: I prefer int instead of unsigned because...

  1. it's shorter (I'm serious!)

  2. it's more generic and more intuitive (i. e. I like to be able to assume that 1 - 2 is -1 and not some obscure huge number)

  3. what if I want to signal an error by returning an out-of-range value?

Of course there are counter-arguments, but these are the principal reasons I like to declare my integers as int instead of unsigned. Of course, this is not always true, in other cases, an unsigned is just a better tool for a task, I am just answering the "why would anyone prefer defaulting to signed" question specifically.

  • 9
    I think it's sad that saying it's shorter needs the (I'm serious). – ChiefTwoPencils Sep 13 '13 at 21:31
  • 2
    @BobbyDigital Indeed. Less worrying should be carried out about "efficiency" and more about correctness, readability and style in general. –  Sep 13 '13 at 21:32
  • Good reasons, but for reason 1, you can always typedef `uint` as `unsigned int`. – us2012 Sep 13 '13 at 21:37
  • @us2012 For some reason (name collisions?) I don't like typedeffing `uint`. –  Sep 13 '13 at 21:40
  • 7
    @H2CO3: If the positive range of `int` is sufficient for your purposes, them `UINT_MAX` is a perfectly good out-of-range value for indicating error conditions. In fact `-1` can be used in the code for that purpose, since it evaluates to `UINT_MAX` when converted to `unsigned`. – AnT stands with Russia Sep 13 '13 at 21:42
  • 5
    typing 'unsigned' is not exactly carpal-tunnel inducing ;) – Mordachai Sep 13 '13 at 21:42
  • 1
    @Mordachai It's not about writability, it's about readability. –  Sep 13 '13 at 21:44
  • @AndreyT Well, fair enough - but how do you store that in an `int`? (or am I being ignorant...?) –  Sep 13 '13 at 21:45
  • 1
    @H2CO3: You don't store in an `int`. I'm referring to your 3rd point about returning an out-of-range value. If your function is declared as returning an `unsigned`, you still have a perfectly good out-of-range value - `UINT_MAX`. And in order to make it clearer that you want to indicate an error condition, you can actually use return values of `-1`, `-2` etc. with `unsigned` types. The behavior of `unsigned` types with such values is well-defined. – AnT stands with Russia Sep 13 '13 at 22:20
  • 1
    @AndreyT [I know that they're defined](http://stackoverflow.com/questions/18795453/why-prefer-signed-over-unsigned-in-c/18795559?noredirect=1#comment27716845_18795568). Sure, this argument of yours is valid. The implicit conversions going on still make me lean towards signed integers, though (except when unsigned ones are a better fit, e. g. bit manipulation and solving certain mathematical problems). –  Sep 13 '13 at 22:25
  • I think point 3 is quite weak (at least in C++) where there are generally much better ways to achieve this. Points 1 & 2 I agree with (that said, I prefer `unsigned`). – Konrad Rudolph Sep 14 '13 at 11:25
  • 1
    @KonradRudolph Fair enough. We have exceptions, past-one iterators, etc. in C++, and generally those are to preferred, generally. One may not want a simple two-liner helper function to deal with exceptions and stuff, though. –  Sep 14 '13 at 11:41
33

Let me paraphrase the video, as the experts said it succinctly.

Andrei Alexandrescu:

  • No simple guideline.
  • In systems programming, we need integers of different sizes and signedness.
  • Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.

Chandler Carruth:

  • Here's some simple guidelines:
    1. Use signed integers unless you need two's complement arithmetic or a bit pattern
    2. Use the smallest integer that will suffice.
    3. Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count.
  • Stop worrying and use tools to tell you when you need a different type or size.

Bjarne Stroustrup:

  • Use int until you have a reason not to.
  • Use unsigned only for bit patterns.
  • Never mix signed and unsigned

Wariness about signedness rules aside, my one-sentence take away from the experts:

Use the appropriate type, and when you don't know, use an int until you do know.

Prashant Kumar
  • 20,069
  • 14
  • 47
  • 63
  • 2
    I find this answer interesting; however, could you elaborate a bit on "use `int` if you think you could count the items"? In particular, doesn't this clash with the rule of "never mix signed with unsigned" when we have to compare with `size_t` variables? – Alberto Moriconi Sep 16 '13 at 19:23
  • 2
    He's just quoting the answers given by the speakers in the video in my OP. They do come back and touch on this topic a second time, including Herb Sutter saying that in the case of size_t, the standards library "got it wrong... sorry for that." – Mordachai Sep 16 '13 at 20:48
  • 1
    @Alberto About the "use `int` if you think you could count the items", we are actually contrasting it against the use of a type like (signed) `long`, which wouldn't challenge the "never mix signed with unsigned" rule. – Prashant Kumar Sep 16 '13 at 21:04
20

Several reasons:

  1. Arithmetic on unsigned always yields unsigned, which can be a problem when subtracting integer quantities that can reasonably result in a negative result — think subtracting money quantities to yield balance, or array indices to yield distance between elements. If the operands are unsigned, you get a perfectly defined, but almost certainly meaningless result, and a result < 0 comparison will always be false (of which modern compilers will fortunately warn you).

  2. unsigned has the nasty property of contaminating the arithmetic where it gets mixed with signed integers. So, if you add a signed and unsigned and ask whether the result is greater than zero, you can get bitten, especially when the unsigned integral type is hidden behind a typedef.

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • 2
    #2 has bit me once. Aaaargh! –  Sep 13 '13 at 21:31
  • 27
    `signed1 - signed2` isn't safe either, because if it overflows you get undefined behavior. – Ben Voigt Sep 13 '13 at 21:33
  • 2
    I think these are the real reasons. – MirroredFate Sep 13 '13 at 21:36
  • 6
    The signed case won't overflow unless `signed1` and/or `signed2` is "large" (more than half the maximum representable value). By contrast, subtracting *anything* from an unsigned value can cause it to wrap. – supercat Sep 13 '13 at 21:52
  • I wonder why C++ doesn't allow one to ask for exception generation when doing overflow (and underflow) on ints and unsigneds? CPUs know when it happens (or they did back when I programmed assembler) - so presumably these things could be trapped. – Mordachai Sep 13 '13 at 22:00
  • 1
    @BenVoigt, true but it's much less likely. – Mark Ransom Sep 13 '13 at 22:15
  • 2
    @Mordachai: There are many reasons not to create this exception. It would prevent the compiler from making many useful optimizations. See http://blog.regehr.org/archives/213 for discussion. – Rob Napier Sep 13 '13 at 22:54
  • 1
    @Mordachai No, the CPU doesn't know, and has never done. Fact is, it does not even know about signedness until it encounters a comparison or division. Then, and only then, will the opcode specify whether it should assume its operands to be either signed or unsigned. All the other time, it simply _ignores_ any overflows that are generated. – cmaster - reinstate monica Sep 13 '13 at 22:56
  • To @supercat's point, not only *can* it cause it to wrap surprisingly, it often does. Try writing a loop that counts down rather than counts up. People almost always screw that up when using unsigned integers. – Rob Napier Sep 13 '13 at 22:56
  • @Mordachai: Also which CPU? I had somewhere CS book that claimed that no ALU can detect overflows, so while it is true now on major architectures it wasn't always necessarily the case (C was created around 1970 as de-facto portable assembly and ANSI standard is from '89 so it required high portability). – Maciej Piechotka Sep 13 '13 at 22:57
  • @H2CO3, you're very lucky that it's only bitten you once (or you've done much less programming than I think you have :D) I've seen this bug many, many times in code bases. – Rob Napier Sep 13 '13 at 22:57
  • 2
    @RobNapier `for(unsigned i = count; i--; ) //whatever` Shortest loop construct there is, and it is absolutely happy with unsigned :-) – cmaster - reinstate monica Sep 13 '13 at 22:58
  • @cmaster: x86 for example sets separate flags for signed and unsigned overflow IIRC (this might not be true for all architectures however). Setting aside the performance it would be trivial to conditionally jump and throw exception (as compiler generating code would know the difference). – Maciej Piechotka Sep 13 '13 at 23:00
  • 1
    @cmaster: Not all machines which can run C wrap in case of signed arithmetic overflow. Some have instructions which saturate instead, and I believe some define "MIN_INT" as "-MAX_INT" and use "-MAX_INT-1" as an integer equivalent of NaN. – supercat Sep 13 '13 at 23:01
  • @MaciejPiechotka Ah, I didn't know that about x86, I learned on the PowerPC. Of course, you could pair every single arithmetic instruction with a conditional jump, but I for one am happy, that compilers don't do something like that. That would be the kind of philosophy that made Java so excrutiatingly slow... – cmaster - reinstate monica Sep 13 '13 at 23:05
  • @cmaster Neither am I for the optimized builds. If I use C/C++ instead of some HL language I probably care about speed and I expect (at least some) micro-optimization from compiler. If I don't than there are other, more suitable, languages. Even on PPC the compiler might detect the overflow casts or even simply using more bits for computation. – Maciej Piechotka Sep 13 '13 at 23:10
  • @RobNapier: Interesting link. I think a good language design should be dictated by the philosophy that implicit transforms should be applied in cases where there's a clear meaning, and a programmer couldn't plausibly expect anything else (e.g. `float f=0.1;`) but should not be allowed in cases where a programmer more likely intends something else (e.g. in `double d=f1+f2;`, it's not terribly likely that the programmer particularly wants the addition to be done in `float`. The programmer might want "whatever's fastest" or might want the operands promoted to `double` first). – supercat Sep 13 '13 at 23:24
  • @RobNapier: For a language to be both portable and efficient, it's often necessary to let implementations do what's convenient; that shouldn't necessarily imply full-blown "undefined behavior", however. A standard could e.g. provide that when a signed arithmetic computation goes out of bounds, any subsequent attempts to use the result may yield any arbitrary values (which may or may not be within the range of the type); storing the result to a `volatile` would store some discrete value of the type, but if it's written to a normal variable and read twice, the reads need not match. – supercat Sep 13 '13 at 23:31
  • @RobNapier Ehh :D Well, maybe it's just that I haven't worked on enough projects sufficiently large to get myself this nice li'l bug... I tend to be able to pay attention almost perfectly when writing a short code snippet, but this superpower quickly vanishes as I approach the 2000 line/file limit... :P –  Sep 14 '13 at 05:11
  • 1
    Concerning 1.: the result is not meaningless at all. Assuming you cast the result of the difference the unsigned type (to deal with implicit promotion), the result is equal to the difference in 2^N modular arithmetic. This means, for example, that comparing it to zero makes a lot of sense - it will be zero iff the operands of the difference are equal. Actually you can pretend you live in two's complement world and do stuff like: uint32_t a = 5; uint32_t b = -1; uint32_t c = a + b; and wonder, c==4. – Ambroz Bizjak Sep 16 '13 at 19:25
  • The modular behavior of unsigned arithmetic is useful for representing time, where you allow time to overflow any number of times, but you can still compute differences between any two points in time as long as they are not too far apart. – Ambroz Bizjak Sep 16 '13 at 19:30
  • @AmbrozBizjak Unsigned arithmetic is useful for many things, just not as a general-purpose representation of integers. As for #1, I'm aware of modular arithmetic, which is why the answer states that the result is "well defined", only meaningless for the domain of the chosen example. – user4815162342 Sep 16 '13 at 19:42
18

There are no reasons to prefer signed over unsigned, aside from purely sociological ones, i.e. some people believe that average programmers are not competent and/or attentive enough to write proper code in terms of unsigned types. This is often the main reasoning used by various "speakers", regardless of how respected those speakers might be.

In reality, competent programmers quickly develop and/or learn the basic set of programming idioms and skills that allow them to write proper code in terms of unsigned integral types.

Note also that the fundamental differences between signed and unsigned semantics are always present (in superficially different form) in other parts of C and C++ language, like pointer arithmetic and iterator arithmetic. Which means that in general case the programmer does not really have the option of avoiding dealing with issues specific to unsigned semantics and the "problems" it brings with it. I.e. whether you want it or not, you have to learn to work with ranges that terminate abruptly at their left end and terminate right here (not somewhere in the distance), even if you adamantly avoid unsigned integers.

Also, as you probably know, many parts of standard library already rely on unsigned integer types quite heavily. Forcing signed arithmetic into the mix, instead of learning to work with unsigned one, will only result in disastrously bad code.

The only real reason to prefer signed in some contexts that comes to mind is that in mixed integer/floating-point code signed integer formats are typically directly supported by FPU instruction set, while unsigned formats are not supported at all, making the compiler to generate extra code for conversions between floating-point values and unsigned values. In such code signed types might perform better.

But at the same time in purely integer code unsigned types might perform better than signed types. For example, integer division often requires additional corrective code in order to satisfy the requirements of the language spec. The correction is only necessary in case of negative operands, so it wastes CPU cycles in situations when negative operands are not really used.

In my practice I devotedly stick to unsigned wherever I can, and use signed only if I really have to.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 6
    I disagree. It's not about competency, but about what is common. (like when to use class vs struct) There are loads of competent programmers out there that could perfectly tell you when they could use an unsigned or signed value, but use signed anyway for these "sociological" reasons. (I'd argue even indentation is used for this purpose - yes, the purpose is to make code easier to read, but that's the point of `int` as well). – Luchian Grigore Sep 13 '13 at 21:53
  • I tend to agree with the comment as I use unsigned whenever the values for the variable are going to be unsigned like in a loop that is only positive values `for (unsigned int i=0; i < 5; ++i)` I feel it gives it a bit of an extra type specifier but I also see your point that just having int is by itself makes the code more succinct . – bjackfly Sep 13 '13 at 23:02
  • Don't use unsigned even if you are confident that the variable will never be negative, even in the loops like the above. Imagine that someone would add something like this in the body of the loop: if(i-3<0){/*something for the middle of the range*/}. If i is unsigned the above code would never be executed. Yes, the above code assume that "someone" is incompetent in using unsigned, but that happens more often than one would like. – Michael Sep 16 '13 at 20:29
  • 6
    @Michael: Not really. This is one of those fake wisdoms that just "sound right". Like the one that people use to justify the "Yoda comparison" syntax, for one example. E.g. they say that one should write `3 == x` instead or `x == 3` to avoid accidentally using assignment instead of `==`. But in reality it is a fake problem that never happens. People who use normal syntax `x == 3` simply don't make that mistake. The same thing with `unsigned`. A competent developer will never write code like `i - 3 < 0`, when the natural way to expresss it is `i < 3` and it is "signedness-independent". – AnT stands with Russia Sep 16 '13 at 20:55
  • 1
    @AnT: Subtracting and comparing with zero may not be particularly useful, but subtracting and comparing with another number may be. I would consider `(uint32_t)(x-y) < z` to be a reasonable way of checking whether `y` is within a certain distance of `x` but not below, though it would be better if it could be written idiomatically without having to name a particular type (while `0u+x-y < z` should work in all cases where `x` and `y` are the same unsigned type, regardless of whether it's larger or smaller than `int`, I don't think the `0u+` is considered a recognized idiom. – supercat Aug 29 '16 at 14:42
  • 1
    `But in reality it is a fake problem that never happens` well I guess the dozen time where it happened to me or coworkers and cost hours in debugging just does not count then. – Jean-Michaël Celerier Mar 07 '18 at 09:45
9

The integral types in C and many languages which derive from it have two general usage cases: to represent numbers, or represent members of an abstract algebraic ring. For those unfamiliar with abstract algebra, the primary notion behind a ring is that adding, subtracting, or multiplying two items of a ring should yield another item of that ring--it shouldn't crash or yield a value outside the ring. On a 32-bit machine, adding unsigned 0x12345678 to unsigned 0xFFFFFFFF doesn't "overflow"--it simply yields the result 0x12345677 which is defined for the ring of integers congruent mod 2^32 (because the arithmetic result of adding 0x12345678 to 0xFFFFFFFF, i.e. 0x112345677, is congruent to 0x12345677 mod 2^32).

Conceptually, both purposes (representing numbers, or representing members of the ring of integers congruent mod 2^n) may be served by both signed and unsigned types, and many operations are the same for both usage cases, but there are some differences. Among other things, an attempt to add two numbers should not be expected to yield anything other than the correct arithmetic sum. While it's debatable whether a language should be required to generate the code necessary to guarantee that it won't (e.g. that an exception would be thrown instead), one could argue that for code which uses integral types to represent numbers such behavior would be preferable to yielding an arithmetically-incorrect value and compilers shouldn't be forbidden from behaving that way.

The implementers of the C standards decided to use signed integer types to represent numbers and unsigned types to represent members of the algebraic ring of integers congruent mod 2^n. By contrast, Java uses signed integers to represent members of such rings (though they're interpreted differently in some contexts; conversions among differently-sized signed types, for example, behave differently from among unsigned ones) and Java has neither unsigned integers nor any primitive integral types which behave as numbers in all non-exceptional cases.

If a language provided a choice of signed and unsigned representations for both numbers and algebraic-ring numbers, it might make sense to use unsigned numbers to represent quantities that will always be positive. If, however, the only unsigned types represent members of an algebraic ring, and the only types that represent numbers are the signed ones, then even if a value will always be positive it should be represented using a type designed to represent numbers.

Incidentally, the reason that (uint32_t)-1 is 0xFFFFFFFF stems from the fact that casting a signed value to unsigned is equivalent to adding unsigned zero, and adding an integer to an unsigned value is defined as adding or subtracting its magnitude to/from the unsigned value according to the rules of the algebraic ring which specify that if X=Y-Z, then X is the one and only member of that ring such X+Z=Y. In unsigned math, 0xFFFFFFFF is the only number which, when added to unsigned 1, yields unsigned zero.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 2
    Nitpicking: Fields allow for division by anything except the additive identity. If all you have is `+`, `-`, and `*`, the algebraic structure is that of a *ring*. –  Sep 13 '13 at 22:52
  • @ChrisWhite: Thanks. Corrected above. It's been ages since I've taken abstract algebra; I'd originally said "group", but groups don't support multiplication. – supercat Sep 13 '13 at 22:57
  • @Chris: But unsigned integral types DO have divison by anything except the additive identity -- it just is based on natural arithmetic with rounding, and not modular equivalence classes. – Ben Voigt Sep 14 '13 at 00:27
  • @BenVoigt Of course, of course. But that "division" is not the inverse of multiplication, and so doesn't make the set a field. But this is all semantics, and I think we all know what we're talking about :) –  Sep 14 '13 at 04:10
  • @Chris: Can you imagine the confusion that would result if C++ actually had Galois Field division on one of its primitive types? – Ben Voigt Sep 14 '13 at 05:17
  • @BenVoigt: Aside from the fact that relatively few processors have instructions to handle such things efficiently, I wouldn't see any problem with a language including primitive types with "unusual" arithmetic behaviors (e.g. reverse-carry addition, Galois-field multiplication, etc.). Implicit conversions should be allowed in cases not involving operator or method overloads, but should not be considered when overloads exist which would not require them. – supercat Sep 16 '13 at 00:03
8

Speed is the same on modern architectures. The problem with unsigned int is that it can sometimes generate unexpected behavior. This can create bugs that wouldn't show up otherwise.

Normally when you subtract 1 from a value, the value gets smaller. Now, with both signed and unsigned int variables, there will be a time that subtracting 1 creates a value that is MUCH LARGER. The key difference between unsigned int and int is that with unsigned int the value that generates the paradoxical result is a commonly used value --- 0 --- whereas with signed the number is safely far away from normal operations.

As far as returning -1 for an error value --- modern thinking is that it's better to throw an exception than to test for return values.

It's true that if you properly defend your code you won't have this problem, and if you use unsigned religiously everywhere you will be okay (provided that you are only adding, and never subtracting, and that you never get near MAX_INT). I use unsigned int everywhere. But it takes a lot of discipline. For a lot of programs, you can get by with using int and spend your time on other bugs.

vy32
  • 28,461
  • 37
  • 122
  • 246
  • 13
    "The problem with `unsigned int` is that it can sometimes (in cases of overflow) generate unexpected behavior." And the problem with `signed int` is that it can sometimes (in cases of overflow) generate undefined behavior. Given those choices, `unsigned` looks pretty nice ;) – Ben Voigt Sep 13 '13 at 21:35
  • 2
    (Of course, overflow occurs for completely different values, so overflow is infrequently a problem for signed types) – Ben Voigt Sep 13 '13 at 21:35
  • 1
    @BenVoigt Also, the "unexpected" is only unexpected if one doesn't know the implicit conversion rules (that's what I called "counter-intuitive"). Fortunately, unsigned overflow is 100% precisely defined by the C and C++ standards (well, as far as my knowledge about this goes). –  Sep 13 '13 at 21:40
  • 1
    @H2CO3: Except for out-of-range shift operand, which is the only example of UB for unsigned types that I know of. – Ben Voigt Sep 13 '13 at 21:41
  • @BenVoigt Ah yes, good ol' shifts. –  Sep 13 '13 at 21:43
  • In my experience, using unsigned (with discipline) bumps up against many APIs which chose int. So I suppose this is itself an argument for using ints, since that way you're not having to deal with int-unsigned boundaries with various APIs, which is definitely a problem (however, if I had my druthers, I'd tell the API writers to be less sloppy and use unsigned for things like count, and size, and index, which cannot be negative) – Mordachai Sep 13 '13 at 21:51
  • @BenVoigt - shift happens. – Mark Ransom Sep 13 '13 at 22:17
  • I use unsigned almost everywhere (or size_t). It's really annoying to have to cast the unsigned to a signed (or ssize_t). I tell my colleagues that if the user will NEVER provide a negative value to a parameter, then it should be unsigned. – vy32 Sep 13 '13 at 22:20
  • @vy32: Why would you have to cast? You shouldn't *want* to shut up the compiler about narrowing conversions. Casts go too far in this respect. – Ben Voigt Sep 14 '13 at 00:29
  • @BenVoigt, I typically compile my programs with `-Wsign-promo` and other warning flags that cause the compiler to throw a warning if I use an unsigned variable for a signed parameter. The parameter is typically a 32-bit or 64-bit value and my values are never larger than 16M (for example). How else do I get the compiler to stop issuing the warning? I have no control over the interface, as its part of a standard library. I'm using unsigned values in my code. – vy32 Sep 14 '13 at 14:14
  • @vy32: With an inline function using implicit conversion, and the warning locally disabled for those 3 lines of code. You want to disable only signed conversion warning in particular places, but casts disable warnings about narrowing, about conversion of pointers into integers, about conversions between incompatible pointer types... – Ben Voigt Sep 14 '13 at 19:10
  • @BenVoigt, unfortunately the `#pragma` to disable the warning locally is not portable. So I end up with additional tests in the configure script, and need to include addition .h files, etc. – vy32 Sep 14 '13 at 20:15
8
  1. Use int by default: it plays nicer with the rest of the language

    • most common domain usage is regular arithmetic, not modular arithmetic
    • int main() {} // see an unsigned?
    • auto i = 0; // i is of type int
  2. Only use unsigned for modulo arithmetic and bit-twiddling (in particular shifting)

    • has different semantics than regular arithmetic, make sure it is what you want
    • bit-shifting signed types is subtle (see comments by @ChristianRau)
    • if you need a > 2Gb vector on a 32-bit machine, upgrade your OS / hardware
  3. Never mix signed and unsigned arithmetic

    • the rules for that are complicated and surprising (either one can be converted to the other, depending on the relative type sizes)
    • turn on -Wconversion -Wsign-conversion -Wsign-promo (gcc is better than Clang here)
    • the Standard Library got it wrong with std::size_t (quote from the GN13 video)
    • use range-for if you can,
    • for(auto i = 0; i < static_cast<int>(v.size()); ++i) if you must
  4. Don't use short or large types unless you actually need them

    • current architectures data flow caters well to 32-bit non-pointer data (but note the comment by @BenVoigt about cache effects for smaller types)
    • char and short save space but suffer from integral promotions
    • are you really going to count to over all int64_t?
Community
  • 1
  • 1
TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • 1
    Best time performance is often dependent on how much data you can fit in cache... and then small types beat 32-bit handily. – Ben Voigt Sep 14 '13 at 00:25
  • 1
    *"bit-shifting signed types is undefined behavior"* - No, it isn't, but it *can* be. – Christian Rau Sep 14 '13 at 11:05
  • @ChristianRau thanks for pointing that out, updated. I didn't want to quote 5.8/2 in its entirety, but that was too much of a shortcut. – TemplateRex Sep 14 '13 at 11:12
  • @TemplateRex Well, unfortunately it still isn't neccessarily undefined behaviour, it's undefined for left-shift and implementation-defined for right-shift. If you didn't want to quote the standard, the easiest would probably have been to just say that it *can* be undefined behaviour. Making exact statements unfortunately comes with the responsibility to be exactly right. :-) – Christian Rau Sep 14 '13 at 11:18
  • @ChristianRau correct again, updated again :-) – TemplateRex Sep 14 '13 at 11:19
  • @TemplateRex Hah, that's also a good way to put it. ;-) – Christian Rau Sep 14 '13 at 11:20
  • @ChristianRau I forgot to read 5.8/3 about the right-shift, in any case, that was the whole point of not using bitshifts on signed types: too darn subtle. – TemplateRex Sep 14 '13 at 11:22
  • 2
    You’ve given a set of guidelines but few explanations. Your convoluted `for` loop in particular needs some ’splainin’. (I’d go as far as saying it’s a *bad* guideline – use `for (auto i = 0u; i < v.size(); ++i)` instead! – or, even better, [`for (auto i : indices(x))`](https://github.com/klmr/cpp11-range).) – Konrad Rudolph Sep 14 '13 at 11:28
  • @KonradRudolph I also prefer range-for over indexed loop. My reason for casting is to stop unsigned ints from propagating into my code base (see point 1). Unfortunately, the Standard Library uses size_t for size(), so I take the ugly path of casting, rather than the prettier path of surrendering my own variables to unsigned . – TemplateRex Sep 14 '13 at 11:37
  • @KonradRudolph I finally took a look at your `indices()` code, really cute! Let's hope the Ranges SG comes up with such functionality! – TemplateRex Sep 23 '13 at 09:22
7

To answer the actual question: For the vast number of things, it doesn't really matter. int can be a little easier to deal with things like subtraction with the second operand larger than the first and you still get a "expected" result.

There is absolutely no speed difference in 99.9% of cases, because the ONLY instructions that are different for signed and unsigned numbers are:

  1. Making the number longer (fill with the sign for signed or zero for unsigned) - it takes the same effort to do both.
  2. Comparisons - a signed number, the processor has to take into account if either number is negative or not. But again, it's the same speed to make a compare with signed or unsigned numbers - it's just using a different instruction code to say "numbers that have the highest bit set are smaller than numbers with the highest bit not set" (essentially). [Pedantically, it's nearly always the operation using the RESULT of a comparison that is different - the most common case being a conditional jump or branch instruction - but either way, it's the same effort, just that the inputs are taken to mean slightly different things].
  3. Multiply and divide. Obviously, sign conversion of the result needs to happen if it's a signed multiplication, where a unsigned should not change the sign of the result if the highest bit of one of the inputs is set. And again, the effort is (as near as we care for) identical.

(I think there are one or two other cases, but the result is the same - it really doesn't matter if it's signed or unsigned, the effort to perform the operation is the same for both).

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
3

The int type more closely resembles the behavior of mathematical integers than the unsigned type.

It is naive to prefer the unsigned type simply because a situation does not require negative values to be represented.

The problem is that the unsigned type has a discontinuous behavior right next to zero. Any operation that tries to compute a small negative value, instead produces some large positive value. (Worse: one that is implementation-defined.)

Algebraic relationships such as that a < b implies that a - b < 0 are wrecked in the unsigned domain, even for small values like a = 3 and b = 4.

A descending loop like for (i = max - 1; i >= 0; i--) fails to terminate if i is made unsigned.

Unsigned quirks can cause a problem which will affect code regardless of whether that code expects to be representing only positive quantities.

The virtue of the unsigned types is that certain operations that are not portably defined at the bit level for the signed types are that way for the unsigned types. The unsigned types lack a sign bit, and so shifting and masking through the sign bit isn't a problem. The unsigned types are good for bitmasks, and for code that implements precise arithmetic in a platform-independent way. Unsigned opearations will simulate two's complement semantics even on a non two's complement machine. Writing a multi-precision (bignum) library practically requires arrays of unsigned types to be used for the representation, rather than signed types.

The unsigned types are also suitable in situations in which numbers behave like identifiers and not as arithmetic types. For instance, an IPv4 address can be represented in a 32 bit unsigned type. You wouldn't add together IPv4 addresses.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • 1
    You surely know that modular arithmetic is perfectly mathematical, right? – GManNickG Sep 13 '13 at 22:42
  • @GManNickG That is why I said "mathematical integers" not "mathematics". In many common situations, modular arithmetic is inappropriate. – Kaz Sep 13 '13 at 22:52
  • 3
    While `for (i = max - 1; i >= 0; i--)` will not terminate, note that `for (i = max - 1; i != -1; i--)` will work as intended (and is independent from signedness of the type). – AnT stands with Russia Sep 13 '13 at 22:54
  • 1
    @Kaz: You probably meant *natural numbers*. – Ben Voigt Sep 14 '13 at 05:22
  • @BenVoigt Why would I invoke the natural numbers {1, 2, 3, ...}; they are hardly relevant here and as a type, they have drawbacks, like not being closed under subtraction, in which regard they are worse than a modular congruence. – Kaz Sep 14 '13 at 07:51
  • @Kaz: In that way they are *exactly* like signed integer types in C++. Which I thought was the topic. – Ben Voigt Sep 14 '13 at 15:33
2

int is preferred because it's most commonly used. unsigned is usually associated with bit operations. Whenever I see an unsigned, I assume it's used for bit twiddling.

If you need a bigger range, use a 64-bit integer.

If you're iterating over stuff using indexes, types usually have size_type, and you shouldn't care whether it's signed or unsigned.

Speed is not an issue.

Luchian Grigore
  • 253,575
  • 64
  • 457
  • 625
  • "`int` is preferred because it's most commonly used" - yup. That. –  Sep 13 '13 at 21:37
  • Speed is very well an issue. When you store an integer, you must set flag if it's positive or negative. With an unsigned you can save this step. In languages above assembler level this is not visible. – ott-- Sep 13 '13 at 21:40
  • 2
    @ott-- I don't follow. What do you mean by "set flag"? Are you saying you set one less bit for an unsigned? Like... you'd just write 31 bits? – Luchian Grigore Sep 13 '13 at 21:42
  • 2
    @ott: There are a lot, probably a majority, of values that are never negative. So your flag, and setting it, is unneeded. – Ben Voigt Sep 13 '13 at 21:43
  • @LuchianGrigore The CPU sets this flag for you. – ott-- Sep 13 '13 at 21:57
  • @ott-- probably in the very same clock cycle in which it performs the rest of the operation. –  Sep 13 '13 at 21:59
  • @BenVoigt A CPU sets even more flags like carry, overflow, odd/even, population. – ott-- Sep 13 '13 at 21:59
  • 2
    @ott-- Do you have a reference? I still can't see how using an unsigned saves on what flags are set or where. – Luchian Grigore Sep 13 '13 at 22:07
  • 2
    @ott--: Aren't on most modern processors the instruction for signed and unsigned addition be more or less the same? Besides the speed of CPU is not determined by how much things it needs to do but by latency (numer of cycles) and clock (so in effect length of critical path) [omitting such details like OOO execution or superscalar architecture]. So as long as it does not increase critical path it should not have any effect on speed and negligible for power consumption. – Maciej Piechotka Sep 13 '13 at 23:17
  • @MaciejPiechotka Indeed, after thinking it over it shouldn't make any difference. – ott-- Sep 13 '13 at 23:39
  • 1
    @ott: If you're talking about CPU flags set by the ALU, you should know that on many architectures those are set for both signed and unsigned. The CPU doesn't have much notion of data types. – Ben Voigt Sep 14 '13 at 00:24
  • @ott-- You know what, your CPU probably doesn't even know if your integer is signed or unsigned, it only knows integers. – Christian Rau Sep 14 '13 at 11:08
2

One good reason that I can think of is in case of detecting overflow.

For the use cases such as the count of items in an array, length of a string, or size of memory block, you can overflow an unsigned int and you may not notice a difference even when you take a look at the variable. If it is an signed int, the variable will be less than zero and clearly wrong.

You can simply check to see if the variable is zero when you want to use it. This way, you do not have to check for overflow after every arithmetic operation as is the case for unsigned ints.

umps
  • 1,119
  • 4
  • 15
  • 25
  • +1 for "make it more obvious when things go wrong" – Cogwheel Sep 13 '13 at 21:44
  • 1
    I come from an assembler background - an overflow was always encoded in the CPU state flags. It would be nice to simply have access to this information, rather than needing to slice off a bit from your range in order to notice such information, no? – Mordachai Sep 13 '13 at 21:46
  • technically, c++ doesn't have to run on a cpu, let alone one with flags that provide this kind of info. overflows are undefined behavior so you're "supposed" to ensure they don't happen in the first place. But yes, it would be nice :P – Cogwheel Sep 13 '13 at 21:52
  • "If it is an signed int, the variable will be less than zero and clearly wrong" - this overflow property is true in languages like Java and C#, but not guaranteed in C/C++. In C and C++, it is undefined behavior to overflow a signed int - so your program can show a positive value or do something completely unexpected. Be careful to not rely on overflown ints as a way of checking sanity. – Nayuki Jul 08 '16 at 21:17
2

For me, in addition to all the integers in the range of 0..+2,147,483,647 contained within the set of signed and unsigned integers on 32 bit architectures, there is a higher probability that I will need to use -1 (or smaller) than need to use +2,147,483,648 (or larger).

franji1
  • 3,088
  • 2
  • 23
  • 43
1

It gives unexpected result when doing simple arithmetic operation:

unsigned int i;
i = 1 - 2;
//i is now 4294967295 on a 64bit machine

It gives unexpected result when doing simple comparison:

unsigned int j = 1;
std::cout << (j>-1) << std::endl;
//output 0 as false but 1 is greater than -1

This is because when doing the operations above, the signed ints are converted to unsigned, and it overflows and goes to a really big number.

SwiftMango
  • 15,092
  • 13
  • 71
  • 136
  • 3
    Yet those aren't any more *"malfunctions"* than any other rule perfectly defined by the standard. I'd consider the undefined behaviour of signed overflow much more a *"malfunction"*. It is true that unsigned behaviour might be a bit counter-intuitive, but *"malfunction"* is definitely the wrong word here. – Christian Rau Sep 14 '13 at 11:10
  • @ChristianRau reworded – SwiftMango Sep 14 '13 at 19:09
  • 1
    Interesting. I find your examples to be perfectly sensible, and expected. Unsigned (ring) arithmetic seems far more sensible to me, than integer when dealing with counts of things that cannot be negative. If this really is the scary part of using unsigned, then I'm satisfied that the advice is merely general purpose, and mostly based on convention rather than any serious concern (when the problem domain doesn't need negative values). – Mordachai Sep 16 '13 at 13:33