Advice on unsigned int (Gangnam Style edition)

Question

The video "Gangnam Style" (I'm sure you've heard it) just exceeded 2 billion views on youtube. In fact, Google says that they never expected a video to be greater than a 32-bit integer... which alludes to the fact that Google used int instead of unsigned for their view counter. I think they had to re-write their code a bit to accommodate larger views.

Checking their style guide: https://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types

...they advise "don't use an unsigned integer type," and give one good reason why: unsigned could be buggy.

It's a good reason, but could be guarded against. My question is: is it bad coding practice in general to use unsigned int?

What a stupid rule: "don't use unsigned type". I can think of a million use cases for unsigned types. — Cory Kramer, Dec 03 '14 at 15:13
I totally disagree with that guideline. The problem with their example is that they're using `i >= 0` as a condition when `i` is an `unsigned int` - that's just stupid. Of course the condition is always going to be true. In fact, in this case, `unsigned int` has done its job! It's never negative! — Joseph Mansfield, Dec 03 '14 at 15:13
I don't agree with that rule either perhaps it is out of fear of corner cases that can happen when you mix signed and unsigned type like in my [answer here](http://stackoverflow.com/a/22047204/1708801) but honestly using the correct warning flags and static analysis should catch these issues. — Shafik Yaghmour, Dec 03 '14 at 15:16
@JosephMansfield I think they realise it's stupid, but it's something that can and has happened. And I see how it could. — keyser, Dec 03 '14 at 15:16
@keyser I can see how many mistakes can happen, but that doesn't mean I start using the wrong tools just to avoid making mistakes with the right tools. — Joseph Mansfield, Dec 03 '14 at 15:20
@JosephMansfield You shouldn't :) Some guidelines are in place for sub-optimal reasons. — keyser, Dec 03 '14 at 15:21
They don't prohibit unsigned types totally. "When appropriate, you are welcome to use standard types like **`size_t`** and `ptrdiff_t`." — Anton Savin, Dec 03 '14 at 15:36
@AntonSavin In fact, their rule is the one I've seen in every shop I've worked in. It seems to be widely accepted by professional programmers; the unsigned types in C++ are often categorized as being broken. (Probably overstatement, but they don't behave well for numeric values.) — James Kanze, Dec 03 '14 at 16:10
OTOH unsigned types are less likely to format your harddisk. — MSalters, Dec 04 '14 at 07:44
@MSalters Yes and no. Arithmetic with them never results in undefined behavior. But unless proper precautions are taken, any arithmetic can result in invalid values for the context they are being used in. Which can, in turn, result in undefined behavior. The idea that using `unsigned` as an index, rather than `int`, will guarantee that your index is in bounds, is ludicrous. (I don't think you're claiming such, but it is, fundamentally, the underlying argument in favor of using `unsigned` rather than `int` for indices.) — James Kanze, Dec 05 '14 at 12:53

score 13 · Answer 1 · answered Dec 03 '14 at 16:06

13

The Google rule is widely accepted in professional circles. The problem is that the unsigned integral types are sort of broken, and have unexpected and unnatural behavior when used for numeric values; they don't work well as a cardinal type. For example, an index into an array may never be negative, but it makes perfect sense to write abs(i1 - i2) to find the distance between two indices. Which won't work if i1 and i2 have unsigned types.

As a general rule, this particular rule in the Google style guidelines corresponds more or less to what the designers of the language intended. Any time you see something other than int, you can assume a special reason for it. If it is because of the range, it will be long or long long, or even int_least64_t. Using unsigned types is generally a signal that you're dealing with bits, rather than the numeric value of the variable, or (at least in the case of unsigned char) that you're dealing with raw memory.

With regards to the "self-documentation" of using an unsigned: this doesn't hold up, since there are almost always a lot of values that the variable cannot (or should not) take, including many positive ones. C++ doesn't have sub-range types, and the way unsigned is defined means that it cannot really be used as one either.

answered Dec 03 '14 at 16:06

James Kanze

150,581
18
184
329

3

I think "widely" is very subjective in this case. Or perhaps I've been lucky in my career.. Personally, I find unsigned numbers with their modulo arithmetic to be a lot more meaningful, mathematically, than their signed counterparts. BTW, I don't think `abs(i1-i2)` makes sense, but I like clang's warning for it. – Cubbi Dec 04 '14 at 04:57
2

Your example is flawed. `std::abs(i1 - i2)` does not even compile assuming you are including `` for the integer version and not `` for the floating point one. I get an `error: call of overloaded ‘abs(unsigned int)’ is ambiguous` in gcc. – b4hand Dec 04 '14 at 05:06
1

Note that's not a warning, it's an *error* even with no warnings enabled. – b4hand Dec 04 '14 at 05:08
Sometimes you have to write code like this: `for (size_t i = 0; i < v.size(); ++i) {...}` and you change `size_t` to `int` you'll get a warning about comparison between signed and unsigned integers. – Anton Savin Dec 04 '14 at 07:55
@b4hand I was referring to "warning: taking the absolute value of unsigned type 'unsigned int' has no effect [-Wabsolute-value] note: remove the call to 'abs' since unsigned values cannot be negative". The fun part is that it had an effect (you have to be calling unqualified (or `::abs`) for that, not `std::abs` which is ambiguous as you say) – Cubbi Dec 04 '14 at 11:58
1

@JamesKanze You make some pretty presumptuous claims in here: "This particular rule corresponds more or less to what the designers of the language intended." I have to ask why the designers made provision for `unsigned` at all then? Also: "The Google rule is widely accepted in professional circles." I'm not sure that I qualify as a veteran, but all 3 of the C++ places I have worked at did used `unsigned` parameters for indexes ect, one of them had a coding standard diametrically opposite to Google that array indexes should *only* be `unsigned` integers. – Jonathan Mee Dec 04 '14 at 14:39
2

Why speculate what the designer of C++ intended with regard to `unsigned`? Stroustrup collaborated on the guidelines for the [JTSF with Lockheed Martin](http://www.stroustrup.com/JSF-AV-rules.pdf). In those guidelines he specifically mentions required usage of `unsigned` and forbids arithmetic on them. – b4hand Dec 04 '14 at 15:51
@JonathanMee "This particular rule corresponds more or less to what the designers of the language intended": this is from personal conversations with them. `unsigned` was designed above all for bit manipulation. As for "widely accepted in professional circles": it may be a question of date: I was a contractor for many years, and have consistently seen `int` used as the standard, over a lot of different companies. But this goes back some, and the anti-pattern of preferring `unsigned` seems a more or less recent anomaly. – James Kanze Dec 05 '14 at 11:25
@b4hand I'm not speculating; I've actually talked to the people involved. As for the guidelines you cite, they correspond pretty much to what I've been saying: use unsigned for bit manipulations, but no arithmetic on unsigned (which pretty much means that they cannot be used for indices). – James Kanze Dec 05 '14 at 11:30
@JamesKanze Did you ask the designers why `unsigned` still exists then? You mentioned bit-masking, But in C++ that should really be done through a `bitset` and even if that didn't exist, negative numbers are just as effective. – Jonathan Mee Dec 05 '14 at 12:49
@JonathanMee `unsigned` still exists because there are use cases where it is appropriate. Read the Google rules, or the cited JTSF coding guidelines. `unsigned` has plenty of uses. It just isn't appropriate for arithmetic values. – James Kanze Dec 05 '14 at 13:31
1

I asked Bjarne and he is basically on your side, @JamesKanze. I'll have to re-examine some things. Perhaps the recent rise in unsigned use corresponds to the time popular compilers started treating x+1 – Cubbi Dec 05 '14 at 15:44
@Cubbi Bjarne is one of the people who has influenced my point of view. But why wouldn't a compiler treat `x + 1 < x` as false? – James Kanze Dec 05 '14 at 17:50
The bug reports filed against gcc when it started doing that were spectacular. Given that the C rationale had something to the lines of that the C community doesn't care because the results are the same "in practice" makes it seem it wasn't isolated. – Cubbi Dec 05 '14 at 18:05
@Cubbi I'm lost on the `x + 1 < x` issue. When would that ever resolve as true? – Travis Bemrose Dec 06 '14 at 06:00
2

@TravisBemrose When the integer, which is unsigned in this case, is too big to be represented as an (unsigned) integer, and so starts to "wrap" (the so called "overflow" occurs). In that case, the old value (x) is actually greter than the old one (x + 1). See http://en.wikipedia.org/wiki/Integer_overflow. – user2340939 Dec 07 '14 at 00:42
1

How does one adopt this guideline without running into all the signed/unsigned comparison warnings when dealing with containers and `size_type` (which is unsigned)? Do you just cast the result of `vector::size()` into a signed integer? – Emile Cormier Mar 12 '15 at 19:39
@EmileCormier It depends, but yes, often I'll cast the results to `int` (after ensuring that the vector can never contain more than `INT_MAX` elements. Most of the time, I'll just turn off this warning, however. If you never use unsigned integral types yourself, the only times it could occur is when dealing with the interface of thing like vector. (Also, of course: it's fairly rare to call `size` on a vector. And almost always, when you call it, it is to immediately compare it to a constant.) – James Kanze Mar 13 '15 at 13:01
@JamesKanze Thanks for the reply. I might rethink my use of unsigned integer types. BTW, I tend to use `vector::size()` in loop condition expressions, whenever I need the index for something other than accessing elements. Otherwise, I'll use iterators which obviously avoids the signed/unsigned comparison warnings. – Emile Cormier Mar 13 '15 at 15:36

b4hand · Answer 2 · 2014-12-05T00:45:05.257

This guideline is extremely misleading. Blindly using int instead of unsigned int won't solve anything. That simply shifts the problems somewhere else. You absolutely must be aware of integer overflow when doing arithmetic on fixed precision integers. If your code is written in a way that it does not handle integer overflow gracefully for some given inputs, then your code is broken regardless of whether you use signed or unsigned ints. With unsigned ints you must be aware of integer underflow as well, and with doubles and floats you must be aware of many additional issues with floating point arithmetic.

Just take this article about a bug in the standard Java binary search algorithm published by none other than Google for why you must be aware of integer overflow. In fact, that very article shows C++ code casting to unsigned int in order to guarantee correct behavior. The article also starts out by presenting a bug in Java where guess what, they don't have unsigned int. However, they still ran into a bug with integer overflow.

You must be aware of the valid ranges for any value, regardless of whether it is signed or not. The real issue here is that some authors seem to recommend using `unsigned` as if it were a sub-range type. Which it isn't; C++ doesn't have sub-range types. — James Kanze, Dec 05 '14 at 11:32
I don't think over/underflow is the issue. I think the issue is the (often overlooked) pitfalls of trying to do arithmetic with unsigned. — Travis Bemrose, Dec 06 '14 at 06:05

score 5 · Answer 3 · answered Dec 04 '14 at 07:53

Use the right type for the operations which you will perform. float wouldn't make sense for a counter. Nor does signed int. The normal operations on the counter are print and +=1.

Even if you had some unusual operations, such as printing the difference in viewcounts, you wouldn't necessarily have a problem. Sure, other answers mention the incorrect abs(i2-i1) but it's not unreasonable to expect programmers to use the correct max(i2,i1) - min(i2,i1). Which does have range issues for signed int. No uniform solution here; programmers should understand the properties of the types they're working with.

Jonathan Mee · Answer 4 · 2014-12-03T16:48:24.890

2

Google states that: "Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation."

I personally use unsigned ints as index parameters.

int foo(unsigned int index, int* myArray){
    return myArray[index];
}

Google suggests: "Document that a variable is non-negative using assertions. Don't use an unsigned type."

int foo(int index, int* myArray){
    assert(index >= 0);
    return myArray[index];
}

Pro for Google: If a negative number is passed in debug mode my code will hopefully return an out of bounds error. Google's code is guaranteed to assert.

Pro for me: My code can support a greater size of myArray.

I think the actual deciding factor comes down to, how clean is your code? If you clean up all warnings, it will be clear when the compiler warns you know when you're trying to assign a signed variable to an unsigned variable. If your code already has a bunch of warnings, the compiler's warning is going to be lost on you.

A final note here: Google says: "Sometimes gcc will notice this bug and warn you, but often it will not." I haven't seen that to be the case on Visual Studio, checks against negative numbers and assignments from signed to unsigned are always warned. But if you use gcc you might have a care.

edited Dec 03 '14 at 16:48

answered Dec 03 '14 at 15:49

Jonathan Mee

37,899
23
129
288

1

Your _pro_ is actually the initial motivation behind requiring `size_t` to be an unsigned type. It was valid on 16 bit machines (which were widespread back then), but really isn't very relevant today. The problem is that C++ (and C) unsigned types do not model integers, or even the set of non-negative integers very well, and should be avoided for such. – James Kanze Dec 03 '14 at 16:08
@JamesKanze I agree with your statement about `size_t` I'm not certain that I understand your statement, "unsigned types do not model the set of non-negative integers very well." – Jonathan Mee Dec 03 '14 at 16:14
1

For example, subtraction isn't defined over the set of non-negative integers. Either the set of non-negative integers is a subset of the set of integers, and subtraction returns a result which may be negative, or you don't support subtraction. if `abs(i1 - i2)` is legal, for example, it must result in the difference between the two values. – James Kanze Dec 03 '14 at 16:36
@JamesKanze This is sadly very opinion based. We can give examples of problems with one side or the other, but I think at the end of the day either way can show examples of poorly written code. If I write a function that takes in two `unsigned int`s and then I try to do `abs(i1 - i2)` on them that's just a poorly written function. It's just as bad as not doing an `assert(index >= 0)`. Either way you choose there will be difficulties. – Jonathan Mee Dec 03 '14 at 16:49
1

`std::abs(i1 - i2)` does not even compile assuming you are including `` for the integer version and not `` for the floating point one. I get an `error: call of overloaded ‘abs(unsigned int)’ is ambiguous` in gcc. – b4hand Dec 04 '14 at 05:05
2

@JamesKanze Subtraction is perfectly well defined on the finite set of non-negative integers modulo some non-negative integer. The entire field of study of modulo arithmetic relies on its definition. In fact I'd say there are many more people intimately familiar with the behavior of subtraction in modular arithmetic than there are twos complement arithmetic. – b4hand Dec 04 '14 at 16:51
@b4hand "Subtraction is perfectly well defined on the finite set of non-negative integers": in C++ and C, yes. It is precisely defined in a way that is wrong for most uses. – James Kanze Dec 05 '14 at 11:34
1

@JamesKanze "It is defined is a way that is wrong for most uses." I know you're just trying to make a point here, but "most uses" really? – Jonathan Mee Dec 05 '14 at 12:37
@JonathanMee Yes. I should have been more precise. By "most uses", I really mean the cases where the type (`int`, etc.) is an abstraction for the set of integers; in other words, where it is an arithmetic, numerical value. (And whether "most uses" is appropriate for this, I don't know. I've worked on applications where most of the integers were serial numbers or identifiers. And since arithmetic operations make no sense on these, it really doesn't matter what integral type they are, as long as it is large enough to contain the needed values.) – James Kanze Dec 05 '14 at 12:47

score 1 · Answer 5 · edited Nov 16 '15 at 22:25

You specific question is:

"Is it bad practice to use unsigned?" to which the only correct answer can be no. It is not bad practice.

There are many style guides, each with a different focus, and while in some cases, an organisation, given their typical toolchain and deployment platform may choose not to use unsigned for their products, other toolchains and platforms almost demand it's use.

Google seem to get a lot of deference because they have a good business model (and probably employ some smart people like everyone else).

CERT IIRC recommend unsigned for buffer indexes, because if you do overflow, at least you'll still be in your own buffer, some intrinsic security there.

What do the language and standard library designers say (probably the best representation of accepted wisdom). strlen returns a size_t, which is probably unsigned (platform dependent), other answers suggest this is an anachronism because shiny new computers have wide architectures, but this misses the point that C and C++ are general programming languages and should scale well on big and small platforms.

Bottom line is that this is one of many religious questions; certainly not settled, and in these cases, I normally go with my religion for green field developments, and go with the existing convention of the codebase for existing work. Consistency matters.

Advice on unsigned int (Gangnam Style edition)

5 Answers5

Linked