14

Referring to this guide:
https://google.github.io/styleguide/cppguide.html#Integer_Types

Google suggests to use int in the most of time.
I try to follow this guide and the only problem is with STL containers.


Example 1.

void setElement(int index, int value)
{
    if (index > someExternalVector.size()) return;
    ...
}

Comparing index and .size() is generating a warning.

Example 2.

for (int i = 0; i < someExternalVector.size(); ++i)
{
    ...
}

Same warning between i and .size().


If I declare index or i as unsigned int, the warning is off, but the type declaration will propagate, then I have to declare more variables as unsigned int, then it contradicts the guide and loses consistency.

The best way I can think is to use a cast like:

if (index > static_cast<int>(someExternalVector.size())

or

for (int i = 0; i < static_cast<int>(someExternalVector.size()); ++i)

But I really don't like the casts.

Any suggestion?


Some detailed thoughts below:

To advantage to use only signed integer is like: I can avoid signed/unsigned warnings, castings, and be sure every value can be negative(to be consistent), so -1 could be used to represent invalid values.

There are many cases that the usage of loop counters are mixed with some other constants or struct members. So it would be problematic if signed/unsigned is not consistent. There will be full of warnings and castings.

Marson Mao
  • 2,935
  • 6
  • 30
  • 45
  • C++11's `auto` and `decltype` sure would help on that. You should also read [**this**](http://herbsutter.com/2013/06/13/gotw-93-solution-auto-variables-part-2/) article by Herb Sutter. – Mark Garcia Jun 19 '13 at 03:57
  • Thanks. But C++11 may not be my solution right now...our project cant support that :( – Marson Mao Jun 19 '13 at 04:24
  • What is your compiler? C++ itself does not dictate warnings. As an aside, have you tried using a 64 bit or larger integer type? There are some quirks around how `int` interacts with `unsigned` types of its size and larger... – Yakk - Adam Nevraumont Jun 19 '13 at 04:28
  • I'm using visual studio 2008 as compiler. – Marson Mao Jun 19 '13 at 05:31
  • 1
    Is there a reason you want to follow that so-called "guide"? – Cubbi Jun 19 '13 at 17:37
  • basically I want to be consistent and not to worry about signed/unsigned anymore. I can be sure every integer can have negative values, and I can use -1 for invalid values for every integer. Also there won't be any warning about signed/unsigned anymore. No cast for signed/unsigned anymore. – Marson Mao Jun 20 '13 at 03:27
  • @MarsonMao If you don't want to worry about `signed/unsigned`, don't use them, use the type returned by the function giving you the value, for which I suggest `auto`. This has been supported since GCC 4.4/Clang 2.9(!)/MSVC 2010. – rubenvb Jun 20 '13 at 08:05

2 Answers2

12

Unsigned types have three characteristics, one of which is qualitatively 'good' and one of which is qualitatively 'bad':

  • They can hold twice as many values as the same-sized signed type (good)
  • The size_t version (that is, 32-bit on a 32-bit machine, 64-bit on a 64-bit machine, etc) is useful for representing memory (addresses, sizes, etc) (neutral)
  • They wrap below 0, so subtracting 1 in a loop or using -1 to represent an invalid index can cause bugs (bad.) Signed types wrap too.

The STL uses unsigned types because of the first two points above: in order to not limit the potential size of array-like classes such as vector and deque (although you have to question how often you would want 4294967296 elements in a data structure); because a negative value will never be a valid index into most data structures; and because size_t is the correct type to use for representing anything to do with memory, such as the size of a struct, and related things such as the length of a string (see below.) That's not necessarily a good reason to use it for indexes or other non-memory purposes such as a loop variable. The reason it's best practice to do so in C++ is kind of a reverse construction, because it's what's used in the containers as well as other methods, and once used the rest of the code has to match to avoid the same problem you are encountering.

You should use a signed type when the value can become negative.

You should use an unsigned type when the value cannot become negative (possibly different to 'should not'.)

You should use size_t when handling memory sizes (the result of sizeof, often things like string lengths, etc.) It is often chosen as a default unsigned type to use, because it matches the platform the code is compiled for. For example, the length of a string is size_t because a string can only ever have 0 or more elements, and there is no reason to limit a string's length method arbitrarily shorter than what can be represented on the platform, such as a 16-bit length (0-65535) on a 32-bit platform. Note (thanks commenter Morwen) std::intptr_t or std::uintptr_t which are conceptually similar - will always be the right size for your platform - and should be used for memory addresses if you want something that's not a pointer. Note 2 (thanks commenter rubenvb) that a string can only hold size_t-1 elements due to the value of npos. Details below.

This means that if you use -1 to represent an invalid value, you should use signed integers. If you use a loop to iterate backwards over your data, you should consider using a signed integer if you are not certain that the loop construct is correct (and as noted in one of the other answers, they are easy to get wrong.) IMO, you should not resort to tricks to ensure the code works - if code requires tricks, that's often a danger signal. In addition, it will be harder to understand for those following you and reading your code. Both these are reasons not to follow @Jasmin Gray's answer above.

Iterators

However, using integer-based loops to iterate over the contents of a data structure is the wrong way to do it in C++, so in a sense the argument over signed vs unsigned for loops is moot. You should use an iterator instead:

std::vector<foo> bar;
for (std::vector<foo>::const_iterator it = bar.begin(); it != bar.end(); ++it) {
  // Access using *it or it->, e.g.:
  const foo & a = *it;

When you do this, you don't need to worry about casts, signedness, etc.

Iterators can be forward (as above) or reverse, for iterating backwards. Use the same syntax of it != bar.end(), because end() signals the end of the iteration, not the end of the underlying conceptual array, tree, or other structure.

In other words, the answer to your question 'Should I use int or unsigned int when working with STL containers?' is 'Neither. Use iterators instead.' Read more about:

What's left?

If you don't use an integer type for loops, what's left? Your own values, which are dependent on your data, but which in your case include using -1 for an invalid value. This is simple. Use signed. Just be consistent.

I am a big believer in using natural types, such as enums, and signed integers fit into this. They match our conceptual expectation more closely. When your mind and the code are aligned, you are less likely to write buggy code and more likely to expressively write correct, clean code.

Community
  • 1
  • 1
David
  • 13,360
  • 7
  • 66
  • 130
  • 1
    For memory, it would be better to use `std::intptr_t` or `std::uintptr_t` which are defined as always capable of holding a pointer. – Morwenn Jun 20 '13 at 07:31
  • 1
    +1 for iterators. Also note that `std::string::npos` is defined in the Standard as `std::string::size_type(-1)`. Does this mean the Standard made a mistake? I don't think so. Using `some_unsigned_type(-1)` is perfectly fine. Just know you will be able to store one element less than the size of the `size_type`. – rubenvb Jun 20 '13 at 07:51
  • Thanks @rubenvb: I edited to mention this and refer to your comment. I had forgotten all about npos when writing this answer and hadn't considered its effect. Using some_unsigned_type(-1) is fine (I don't have any concerns about the standard!), just so long as it's deliberate, not a bug... – David Jun 20 '13 at 08:26
  • Thanks for the reply! Iterator is a good idea, I often tries to use it. But when I use vector as an array, I have to care about the input index or the loop counter (as my examples in the post), I need to check the index with vector size, or I pass the loop counter to another function that accept the counter (and these values could be `< 0`). In that case, would you recommend to type cast vector size? i.e. `static_cast(Vector.size())`? – Marson Mao Jun 21 '13 at 02:08
  • Can your input index be -1, eg to indicate no index or an invalid object? You can use the array operator [] with int, no problem. You can cast the vector size to int to compare, if you are sure you have less than 2 billion values in your vector... which is probably a fairly safe bet :) – David Jun 21 '13 at 12:27
  • Hey truly sorry for late reply, I missed the notification. Yes -1 is indicating no index or invalid object. And yes my containers are all small, only up to like at most 10K elements, so I try to cast it to signed as current solution. – Marson Mao Jun 24 '13 at 02:06
  • 5
    Well iterators and range-based `for`s are of course nice, sometimes you just can't just get away with using of indexes, iterating over 2 or more containers simultaneously and stuff like that. (it can be done with iterators but with too much of additional code which actually may be less understandable). And actually the thing that is bothering me about signed/unsigned problem is that if you had loop using `unsigned` as index and then change it to iterate in reversed order, it will lead to non-obvious mistake. – Predelnik May 08 '14 at 11:06
  • Good answer apart from "You should use an unsigned type when the value cannot become negative". See Stroustrup's quote from http://stackoverflow.com/questions/10168079/why-is-size-t-unsigned – Jon May 28 '14 at 01:53
  • 2
    Jamin Gray's answer is not _above_. One has to be careful when using "above" or "below" referring to other answers on this site because such references can be invalidated by votes. – Ruslan Jan 26 '15 at 11:20
  • "You should use an unsigned type when the value cannot become negative " is very bad advice. Instead, use a signed type and assert that it's positive when appropriate. – Nir Friedman May 19 '15 at 14:26
  • @NirFriedman Agreed, except that the STL and other common C++ code does this *all the time* - so much so it's idiomatic. I wrote "can not" not "should not", too - there is a subtle distinction. I personally prefer signed integers in all cases but using them in many places causes too many problems / warnings interacting with library C++ code. – David May 24 '15 at 15:09
  • Using an unsigned type because it's returned from the STL is a strict subset of using unsigned to represent non-negative. What you wrote sounds like an endorsement of e.g. using unsigned in a function signature for non negative parameter. I agree using signed with STL is annoying. I use iterators wherever possible; sometimes cast. I never use unsigned for comparisons, arithmetic, it's dangerous and slow. – Nir Friedman May 24 '15 at 18:51
4

Use the type that the container returns. In this case, size_t - which is an integer type that is unsigned. (To be technical, it's std::vector<MyType>::size_type, but that's usually defined to size_t, so you're safe using size_t. unsigned is also fine)

But in general, use the right tool for the right job. Is the 'index' ever supposed to be negative? If not, don't make it signed.

By the by, you don't have to type out 'unsigned int'. 'unsigned' is shorthand for the same variable type:

int myVar1;
unsigned myVar2;

The page linked to in the original question said:

Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce.

It's not just self-documentation, it's use the right tool for the right job. Saying that 'unsigned variables can cause bugs so don't use unsigned variables' is silly. Signed variables can also cause bugs. So can floats (more than integers). The only guaranteed bug-free code is code that doesn't exist.

Their example of why unsigned is evil, is this loop:

for (unsigned int i = foo.Length()-1; i >= 0; --i)

I have difficulty iterating backwards over a loop, and I usually make mistakes (with signed or unsigned integers) with it. Do I subtract one from size? Do I make it greater-than-AND-equal-to 0, or just greater than? It's a sloppy situation to begin with.

So what do you do with code that you know you have problems with? You change your coding style to fix the problem, make it simpler, and make it easier to read, and make it easier to remember. There is a bug in the loop they posted. The bug is, they wanted to allow a value below zero, but they chose to make it unsigned. It's their mistake.

But here's a simple trick that makes it easier to read, remember, write, and run. With unsigned variables. Here's the intelligent thing to do (obviously, this is my opinion).

for(unsigned i = myContainer.size(); i--> 0; )
{
    std::cout << myContainer[i] << std::endl;
}

It's unsigned. It always works. No negative to the starting size. No worrying about underflows. It just works. It's just smart. Do it right, don't stop using unsigned variables because someone somewhere once said they had a mistake with a for() loop and failed to train themselves to not make the mistake.

The trick to remembering it:

  1. Set 'i' to the size. (don't worry about subtracting one)
  2. Make 'i' point to 0 like an arrow. i --> 0 (it's a combination of post-decrementing (i--) and greater-than comparison (i > 0))

It's better to teach yourself tricks to code right, then to throw away tools because you don't code right.

Which would you want to see in your code?

for(unsigned i = myContainer.size()-1; i >= 0; --i)

Or:

for(unsigned i = myContainer.size(); i--> 0; )

Not because it's less characters to type (that'd be silly), but because it's less mental clutter. It's simpler to mentally parse when skimming through code, and easier to spot mistakes.


Try the code yourself

Jamin Grey
  • 10,151
  • 6
  • 39
  • 52
  • Thanks for fast reply, but please read the link I gave. To quote the text here: "Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce". – Marson Mao Jun 19 '13 at 04:23
  • 1
    On 64-bit machines, `size_t` is a 64-bit unsigned integer type, not `unsigned int`. On x86/64 CPUs, `size_t` is equivalent to `unsigned long`. – user172818 Jun 19 '13 at 04:57
  • 1
    @MarsonMao You are using C++, not C. Yes, making buggy code can (surprise!) be buggy. You can make buggy code with signed or unsigned ints. That's not a reason to not use unsigned ints. In the example they give, of negative iteration, there's a simple trick to iterate properly, which I'll post in the answer. – Jamin Grey Jun 19 '13 at 16:40
  • 2
    @user172818 I said it's "an unsigned integer type" - I didn't mean an 'unsigned int'. An unsigned long is also "an unsigned integer type". I'll try to clarify this in the answer, thanks! – Jamin Grey Jun 19 '13 at 17:16
  • Hi, thanks for the reply. There are more reason than that for-loop example. To use only signed integer I can avoid signed/unsigned warnings, castings, and be sure every value can be negative(to be consistent), and I can always use -1 for invalid value. There are many cases that it mixes loop counter with some constant or struct member, etc. If signed/unsigned is not consistent then there will always be warnings and castings, which is annoying and meaningless. So that's why I want to unify all the integers to be signed. – Marson Mao Jun 20 '13 at 03:33
  • 1
    [`-->` - ugh and 'wtf', to quote.](http://stackoverflow.com/questions/1642028/what-is-the-name-of-this-operator) Using unsigned just because signed can also have bugs is no justification, and if you have to use tricks to ensure you can use them properly, that's a danger sign - not a reason to say they're okay. Signed is a far more natural type, safer in more situations. @MarsonMao, IMO, your text is correct about bugs outweighing advantages. From the convolutions in the above reply you can probably see how they are not reliably easy to use correctly. – David Jun 20 '13 at 06:46
  • the `goes to` operator is obfuscated at best. `sizeof(std::size_t) >= sizeof(unsigned)`, on e.g. Windows `unsigned` might not suffice if the number of elements exceeds `2^32-1`, which may well happen; it's only 4 gigabytes worth of `char`s.. – rubenvb Jun 20 '13 at 07:50
  • 1
    "Which would you want to see in your code?" `(i--) > 0`, obviously. `i-->0` looks a lot like `i -> 0` and reminds me of erlang code. Such syntax will confuse the heck out of maintenance guy. "because someone somewhere once said they had a mistake and failed to train themselves" I think such attitude is unhealthy. Unless you're a one-man-army software developer, you won't get away with it. Your code is written for people. And there's a very good chance that people that will read it are less skilled than you or even completely incompetent. For those people, your code should be easy to read. – SigTerm Jun 20 '13 at 08:42
  • @SigTerm I agree about "(i--) > 0". It's true that I'm a solo developer atm, so that's a good point about --> being not reader-friendly. It's still a useful learning tool. – Jamin Grey Jun 21 '13 at 23:07
  • 1
    @MarsonMao "To use only signed integer I can avoid signed/unsigned warnings". You want to fix the warnings, instead of hiding them. My code posted above doesn't contain castings, either. "and be sure every value can be negative(to be consistent)". That's not an argument. A signed integer is a different variable than an unsigned integer. It's not being consistent, it's being sloppy! =P You might as well make everything floats so, 'every value can contain decimal points, to be consistent'. Different tool, different job. Don't replace nails with screws (or vise-versa) "just to be consistent". – Jamin Grey Jun 21 '13 at 23:15
  • 1
    @MarsonMao "and I can always use -1 for invalid value." - proper error handling is better. If your number is supposed to be negative, -1 as a invalid value makes your code buggy. The code example in the google document is *buggy*. Instead of *fixing* the bug, they changed what type of variable it was, and called it a feature. 'The bug is gone, so it must be good right?'. No, fixing the code is good. – Jamin Grey Jun 21 '13 at 23:16
  • 1
    @MarsonMao You are throwing out a tool that is intentionally in the language, just to avoid fixing the code. The warnings indicate something is wrong. The casts document intentional behavior. Permitting code that isn't support to be negative to accidentally allow negative numbers, 'just in case', is the wrong way to program. – Jamin Grey Jun 21 '13 at 23:16
  • 1
    @JaminGrey: "You want to fix the warnings, instead of hiding them". [You cannot fix all warnings on all compilers](http://www.zlib.net/zlib_faq.html#faq35). Try compiling projects with Microsoft compiler with -WAll. You'll get *thousands* of warnings, even on harmless code. When you see a warning, you should ensure that you know what you're doing, and hide the warning, if it is not helpful. – SigTerm Jun 22 '13 at 01:58
  • @JaminGrey: I'd advise to get 3..5 more years of programming experience + some time in the most horrible programming team you can find. That'll change your opinion about useful practices. At the moment, what you say is too perfectionist: "proper error handling is better" (1: define "proper"/"better". 2: If your project forbids using STL and exceptions then what?). Perfectionism is harmful, and trying to write perfect code is very different from getting things done. You need to balance development time,code quality, maintainability,readability,performance. Failure to balance can get you fired. – SigTerm Jun 22 '13 at 02:07
  • 1
    @SigTerm We're talking about *guidelines*. Guidelines *should* aim for perfection, and code should document where reality deviates from perfection. "Useful practices" are things that are good to follow, but are not laws or rules. Guide lines are *lines* that *guide*. They aren't walls that constrain. Example of a guideline: "If an integer shouldn't be able go into negative, don't make it signed.". An example of a bad law: "Never use unsigned integers, because they sometimes cause warnings.". Ofcourse there is a balance! But any "balance" that says, 'never use unsigned integers' is **wrong**. – Jamin Grey Jun 22 '13 at 06:05
  • 1
    @SigTerm "*You cannot fix all warnings on all compilers.*" So instead we'll ignore them? If we're establishing practices, guidelines, and suggestions, we should aim for the general case. In general, most warnings are useful and should be noted. Warnings aren't errors, but they are warning you for a reason. Casting just to stop a warning, without first understanding what the warning is trying to communicate, is not how a good programmer programs - that's how bugs get hidden. – Jamin Grey Jun 22 '13 at 06:08
  • @JaminGrey: "we". Who, exactly, are "we"? "we'll ignore them?" When warning appears, you should check it, and if your code does what you want it to do, BUT warning still appears, you should disable the warning. "If we're establishing practices," You, as a programmer, ARE NOT establishing practices. You're solving problems and get things done. "they are warning you for a reason" and that reason might be "print this in case user doesn't know what he is doing". "most warnings" are useless (/W4), generate noise, which is a bad thing. – SigTerm Jun 22 '13 at 11:30