188

Many style guides such as the Google one recommend using int as a default integer when indexing arrays for instance. With the rise of 64-bit platforms where most of the time an int is only 32 bits which is not the natural width of the platform. As a consequence, I see no reason, apart from the simple same, to keep that choice. We clearly see that where compiling the following code:

double get(const double* p, int k) {
  return p[k];
}

which gets compiled into

movslq %esi, %rsi
vmovsd (%rdi,%rsi,8), %xmm0
ret

where the first instruction promotes the 32 bits integer into a 64 bits integer.

If the code is transformed into

double get(const double* p, std::ptrdiff_t k) {
  return p[k];
}

the generated assembly is now

vmovsd (%rdi,%rsi,8), %xmm0
ret

which clearly shows that the CPU feels more at home with std::ptrdiff_t than with an int. Many C++ users have moved to std::size_t, but I don't want to use unsigned integers unless I really need modulo 2^n behaviour.

In most cases, using int does not hurt performance as the undefined behaviour or signed integer overflows allow the compiler to internally promote any int to a std::ptrdiff_t in loops that deal with indices, but we clearly see from the above that the compiler does not feel at home with int. Also, using std::ptrdiff_t on a 64-bit platform would make overflows less likely to happen as I see more and more people getting trapped by int overflows when they have to deal with integers larger than 2^31 - 1 which become really common these days.

From what I have seen, the only thing that makes int stand apart seems to be the fact that literals such as 5 are int, but I don't see where it might cause any problem if we move to std::ptrdiff_t as a default integer.

I am on the verge of making std::ptrdiff_t as the de facto standard integer for all the code written in my small company. Is there a reason why it could be a bad choice?

PS: I agree with the fact that the name std::ptrdiff_t is ugly which is the reason why I have typedef'ed it to il::int_t which look a bit better.

PS: As I know that many people will recommend me to use std::size_t as a default integer, I really want to make it clear that I don't want to use an unsigned integer as my default integer. The use of std::size_t as a default integer in the STL has been a mistake as acknowledged by Bjarne Stroustrup and the standard committee in the video Interactive Panel: Ask Us Anything at time 42:38 and 1:02:50.

PS: In terms of performance, on any 64-bit platform that I know of, +, - and * gets compiled the same way for both int and std::ptrdiff_t. So there is no difference in speed. If you divide by a compile-time constant, the speed is the same. It's only when you divide a/b when you know nothing about b that using 32 bits integer on a 64-bit platform gives you a slight advantage in performance. But this case is so rare as I don't see as a choice from moving away from std::ptrdiff_t. When we deal with vectorized code, here there is a clear difference, and the smaller, the better, but that's a different story, and there would be no reason to stick with int. In those cases, I would recommend going to the fixed size types of C++.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
InsideLoop
  • 6,063
  • 2
  • 28
  • 55
  • 34
    "Many style guides such as the Google one recommend to use int as a default integer, when indexing arrays for instance" - citation needed. You should always use `size_t` or `size_type` (STL). – Dai Feb 11 '18 at 07:42
  • 30
    *"which gets compiled into"* - With what flags and by which compiler? – StoryTeller - Unslander Monica Feb 11 '18 at 07:43
  • 33
    Using `int` to index into arrays is simply wrong as there's no guarantee `int` is large enough to cover all possible indices in an array. `std::size_t` is the right type for that. – user703016 Feb 11 '18 at 07:44
  • 4
    That promotion is only necessary, because the other operand is a pointer and because the int is passed through a non-inlined function interface. In "normal integer" code, you shouldn't see any difference. – MikeMB Feb 11 '18 at 07:45
  • 75
    The Google style guide is mostly crap. I'd stay away from it. – Jesper Juhl Feb 11 '18 at 07:54
  • 34
    "I don't want to use unsigned integers unless I really need modulo 2^n behaviour." -- you don't want to pass a negative array index either... (*Undefined Behavior* in C) – David C. Rankin Feb 11 '18 at 07:54
  • 4
    Integers used in array indexes are also used in other mathematical operations that might result in negative values. For example, subtracting one index from another. – Benjamin Lindley Feb 11 '18 at 07:58
  • 7
    @DavidC.Rankin Treating indexes as signed sometimes simplifies algorthms. You can write, e.g., `while (i >= 0) ...` and do some substracting inside loop. – Daniel Langr Feb 11 '18 at 08:04
  • 13
    @DavidC.Rankin: How does an unsigned type help with that? Instead of UB due to a negative index in `a[i]`, you will either get UB due to a positive out-of-bounds index or defined but undesired behaviour. The bug has already happened before that; `i` must not be negative. You should therefore use a signed number so that you can *detect* the error (e.g. `assert(i >= 0);`). – Christian Hackl Feb 11 '18 at 08:06
  • 1
    What?? You can't presume using an unsigned index will result in UB due to positive out of bounds unless you are dumb enough not to protect your array bounds. One doesn't always follow the other and I'm not saying that you don't want to use `int` as an index - God, that is done all the time. The comment was a response to the quote and nothing more. – David C. Rankin Feb 11 '18 at 08:13
  • 8
    @DavidC.Rankin: I don't understand what you are trying to say. – Christian Hackl Feb 11 '18 at 08:15
  • 1
    It is nothing but a rhetorical analogy. you don't any more want to refrain from using an unsigned index than you do want to pass a negative index. Sorry for the philosophical paradox. – David C. Rankin Feb 11 '18 at 08:18
  • 3
    How does "default integer" even make sense? It's not like there are "default requirements" for any variable or application range... – Andreas Feb 11 '18 at 08:26
  • @Andreas: For example, `auto i = 0;`. – Christian Hackl Feb 11 '18 at 08:27
  • 3
    Personally I prefer using int32_t, uint64_t and the like, as it is very clear how your variable is supposed to behave in regards to signed/unsigned and size in memory – Eyal K. Feb 11 '18 at 10:03
  • 1
    Herb Sutter wrote on C++ Core Guidelines, that ptrdiff_t is recommended. See answer. – Robert Andrzejuk Feb 11 '18 at 10:44
  • 1
    Not an answer, but every time this topic gets brought up, I remember this article: https://www.viva64.com/en/a/0050/ – CookiePLMonster Feb 11 '18 at 11:06
  • 6
    IMO I know some feel `std::ptrdiff_t` is best because it is signed. I find the reasons for choosing signed over unsigned unconvincing at best. The fact is it will cause a real headache for anyone using the `STL` (which should be everyone) and any supposed benefits are marginal compared with the awkwardness throughout the codebase. My philosophy is use the natural type for the job. Arrays do not have negative indexes. Treat *math* differently to indexing. – Galik Feb 11 '18 at 11:38
  • 1
    I've never read the Google style guide - I be afeared of it or, more accurately, it's likely effect on my blood pressure. – Martin James Feb 11 '18 at 11:42
  • To start with, negative/overflow indices usually result in spectacular fails. Such crashes are easily fixed and highly preferable to sneaky bugs that need a pile of effort to dig out. If an index is supposed to go from 0 to 255, I'll use an int, and I don't care what Goole says:) – Martin James Feb 11 '18 at 11:50
  • There is a question here regarding a signed `int` used in a loop that could possibly be due to the `UB` of signed integer overflow. https://stackoverflow.com/questions/48731306/program-behaving-strangely-on-online-ides#48731306 At least unsigned overflow is well defined. – Galik Feb 11 '18 at 12:20
  • 9
    The title is bizarre to me. Yes, there absolutely is. If you can't think of a situation where `int` is sufficient (in terms of its required range), then you must have a very niche set of requirements and narrow view of what other people are doing with such a broadly usable language. Thinking everyone else should have to use verbose monstrosities like `std::ptrdiff_t` even in situations where it's overkill is absurd. If you're only talking about indexing into containers, then fix your title not to imply *all* uses of `int`. – underscore_d Feb 11 '18 at 13:22
  • @underscore_d: The title is completely fine. Can you give an example where `int` is the best solution (not sufficient, but the best)? Because if for every situation, there is a better alternative to `int` (like `ptrdiff_t`, `intXX_t`, `int_leastXX_t`), then there is no reason to use it. – geza Feb 11 '18 at 14:18
  • 2
    The title question is teasing, the question is actualy "Is there any reason to use int for indexing in C++" the answer is trivial and there are already many answer on SO for this. Hope this behavior will not get viral. – Oliv Feb 11 '18 at 17:03
  • 4
    @JesperJuhl: The Google style guide is designed for use at Google. It works very well for that specific purpose. It is not intended for general consumption unless you're going to be committing code to one of Google's open-source projects (or you have [an irrational fear of exceptions](https://google.github.io/styleguide/cppguide.html#Exceptions)). – Kevin Feb 11 '18 at 17:24
  • @Kevin agreed. Which also means that it is (likely) crap for most non-google uses, since it is written for *one* very specific use-case. – Jesper Juhl Feb 11 '18 at 17:27
  • 1
    @geza Print the numbers from 1 to 10: `for(int i = 1; i <= 10; i++) printf("%d\n", i);` – Steve Summit Feb 11 '18 at 18:21
  • @SteveSummit: I accept this example :) That's a rare occasion where `int` is fine. But, if you're in a little bit larger program, and have a short named 16-bit integer typedef (like `s16`), that's equally good as `int`. And, if you want to print numbers 1 to 50000 portably, you cannot use `int` any more. But you can use `s32`. – geza Feb 11 '18 at 18:37
  • 3
    less instructions != faster code. You need to measure to find out. For example, what is the cost of the increased data size when using 64-bit indexes? Do you get less available registers? Do you prevent the compiler from parallelizing 32-bit data operations? Does all this even matter in the end? – Nikos C. Feb 11 '18 at 18:43
  • @geza I'm not going to get into a long back-and-forth on this, but I hope you know that `int16_t` is a significantly poor choice for that sort of thing. – Steve Summit Feb 11 '18 at 18:44
  • @SteveSummit: for your example, there is nothing wrong with using `int16_t`. With an optimizing compiler, you'll get the same results as `int`. And you can use `int_fast16_t`, if you fear of performance degradation. Besides, there are 64-bit platforms (like PowerPC), where `int`'s performance is not ideal, because it is 32-bit. Unfortunately, `int` doesn't mean register-width for 64-bit architectures. I've done optimizations with changing `int` to `s64`. – geza Feb 11 '18 at 18:51
  • 1
    `movslq` is only necessary there because the x86-64 ABI calling convention allows for garbage in the upper 32 bits of registers used to pass 32-bit arguments. If you do something like `int k = 4; return p[k];` there will be no need to sign-extend it. – Tavian Barnes Feb 11 '18 at 19:25
  • 4
    By forcing the use of a 64-bit type where a 32-bit type would do, you immediately double your code's data cache footprint, memory throughput requirements, and so on. That seems pretty senseless. – David Schwartz Feb 11 '18 at 19:29
  • @DavidSchwartz: if you responded to my comment: I'm talking about local variables which are put to registers here, not memory. Of course, one should use the smallest data type for storing to memory. But on a 64-bit processor, which doesn't have the intrinsic ability to access lower/higher 32-bits of a register (for example, PowerPC), 32-bit `int`s are not ideal (because of 32-bit maskings and 64-bit sign extensions needed). – geza Feb 11 '18 at 20:14
  • 1
    1. Many APIs use `int`, including parts of the standard library. 2. Because of the default type promotion rules, types narrower than `int` could be widened to `int` or `unsigned int` unless you add explicit casts in a lot of places, and a lot of different types could be narrower than `int` on some implementation somewhere. So, if you care about portability, it’s a minor headache. – Davislor Feb 11 '18 at 21:03
  • 1
    @DavidC.Rankin So according to you the following code is invalid (I believe SE uses markdown so hopefully I can get this to work out): *`char str[]="Hello"; char *p = str + 1; p[-1] = 'W';`* ? Because actually that's perfectly valid. Sometimes invalid doesn't equate to always invalid. So unless that's a brain dead change in C++ from C... – Pryftan Feb 12 '18 at 01:48
  • It's a bit unclear from your question if you mean in this particular place or in general. If in general than for large data and your algorithm is memory-bound saving 50% of bandwidth/$ eviction can be a big deal (see for example x32 ABI). I wouldn't worry about assembly - esi->rsi mov is 'cheap' and might be optimized out in uops (x86 assembly is JITted to internal representation) while load is 4 cycles minimum (L1 hit) and have no upper bound (page fault with swap with slow backing device). – Maciej Piechotka Feb 12 '18 at 02:04
  • 1
    "Defaulting to `int`" is a corrupt legacy of pre-historic C. The default type in modern C++ code has to be *unsigned*. – AnT stands with Russia Feb 12 '18 at 02:18
  • @Pryftan as long as `p[-1]` (or `*(p - 1)`) satisfies [C11 §6.5.6 Additive operators (p8)](http://port70.net/~nsz/c/c11/n1570.html#6.5.6p8) there is no problem. The original comment was intended in the context of a *declaration* where attempting to declare an array of size zero or less is undefined. – David C. Rankin Feb 12 '18 at 02:36
  • @DavidC.Rankin That's fair enough. I was thinking more generally. It must be said that yesterday was a particularly awful day - or rather I was feeling very awful. You're of course correct in your clarification. And yes of course it does translate to `*(p-1)` but I was using the array syntax given the context... Thank you for clarifying and I apologise if I was presumptuous (it wasn't intended although thinking back to yesterday I think it actually was - even if only subtly or barely). – Pryftan Feb 12 '18 at 14:41
  • @Pryftan By using `char` for character data, you are using another can of worms ;) – Hagen von Eitzen Feb 12 '18 at 18:55
  • @HagenvonEitzen Please elaborate. I have two thoughts: it's some pun I'm missing or you're thinking of C++ and its STL string. For the record I had a bad reaction to C++ and although I understand the intent of std::string and although I have used it I prefer C through and through. There is a third option of course: it's something else entirely. So which is it? – Pryftan Feb 12 '18 at 22:22
  • @DanielLangr Don't bother writing `for(i = count - 1; i >= 0; i--)`, just write `for(i = count; i--; )`. Works independent of whether `i` is signed or unsigned, and you save ten characters... – cmaster - reinstate monica Feb 13 '18 at 14:09
  • 1
    Imho, it was a mistake to not grow `int` to 64 bits. It *should* always have been the natural CPU integer type. As it is, we are left without a type that's defined to be the natural CPU integer type. But, if you are looking at `size_t` as an unsigned substitute, you can also look at `ssize_t` as a signed substitute. No need to drop that ghastly `ptrdiff_t` everywhere. – cmaster - reinstate monica Feb 13 '18 at 14:13
  • @cmaster: If `int` was 64 bit, there would not have enough room for 8, 16 and 32-bit integers as we only have `char` and `short` below `int`. But a new name could have been invented. Besides, I think there were historical reasons to keep it 32-bit, some of them related to Fortran. But I agree with you, it would be so nice if `int` were still the natural CPU integer type. Concerning `ssize_t`, it does not seem to be in the standard. – InsideLoop Feb 14 '18 at 06:41
  • `ssize_t` is the return type of `read()`, and thus POSIX.1-2001 standard. It may not be available on Windows, though. I wouldn't know. As to whether we need new names, no I don't think so: We have the the fixed size types `int8_t`, `int16_t`, and `int32_t`; nothing ever suggested that these sizes should be available as `char`, `short`, and `int`. The only guarantees by the C standard are, that `1 == sizeof(char) <= sizeof(short) <= sizeof(int)`, and that `int` is at least 16 bits. There is nothing there that says that the type sequence must not have holes. – cmaster - reinstate monica Feb 14 '18 at 07:26
  • @cmaster I didn't say that I decrement `i` by one only. For instance, in some multithreaded algorithms data are processed in blocks and some global atomic indexes are decremented by a block size (fetch and sub). Using unsigned integers here might require more `if` conditions inside code. – Daniel Langr Feb 15 '18 at 07:14
  • @cmaster Concerning the pattern `for(i = count; i--; )`, even though it works, I believe it is a disaster in a code because it is difficult to read. I would consider it as an anti-pattern. What makes me scream is the awful code people are ready to type to defend unsigned integers. There is not a single valid reason to defend them. Even on a 32-bit system, when people want a `std::vector` whose length is >= 2^31, the STL cannot provide that because their implementation use two pointers, one at the beginning, one at their end. The size is `end - begin` which gives back.... a `ptrdiff_t`. – InsideLoop Feb 15 '18 at 10:05
  • @InsideLoop In that case, I have tough news for you: Your computer cannot calculate with signed integers. It cannot add them, it cannot subtract them, it cannot multiply them, it can only divide them. Each and every signed `int` you use in your code is treated as unsigned by the hardware **until you divide or compare**. Then, and only then does the hardware interpret the bit pattern as a signed number. When you do `end-begin`, the CPU computes the difference just fine as unsigned arithmetic modulo `2^32`. It it 100% your fault and problem that you subsequently interpret that result as signed. – cmaster - reinstate monica Feb 15 '18 at 10:15
  • @cmaster I agree with you that every operation such as `+`, `-` and `*` are treated modulo `2^n` on the assembly level. The advantage of signed integers over unsigned ones are at the intermediate representation level where the compiler optimizes the code. For the maximum size of a `std::vector`, if you use libc++ (clang) on a 32-bit platform, the method `max_size()` will return 2 147 483 647 which is the largest signed integer. With libstdc++ (gcc), you'll get 4 294 967 295. The `size` method does compute `end - begin` which is a `ptrdiff_t` that is casted to a `size_t`. – InsideLoop Feb 15 '18 at 10:43
  • @cmaster So overflow might happen within `libstdc++` for vectors of char of size more than 2 147 483 648 on a 32-bit platform. I agree that gcc and most compilers will treat this undefined behaviour in a way that gives back the correct result. But I strongly believe that `libstdc++` does not respect the standard here whereas `libc++` does. – InsideLoop Feb 15 '18 at 10:48
  • @cmaster The standard says that the difference of 2 pointers is a `ptrdiff_t` which is a signed integer. So `end - begin` should be treated as a signed integer, not an unsigned one. – InsideLoop Feb 15 '18 at 11:00
  • @InsideLoop To be precise, `ptrdiff_t` must be defined as a signed quantity: When you do `end-begin`, the hardware actually calculates `(end-begin)/sizeof(*begin)`. This is a division, and that division needs to be performed in signed arithmetic in general. So, actually, I was wrong in my last comment: It is not **your** fault, but the **language's** fault for defining the pointer difference the way it does. A correct implementation would need to compare the two pointers first, then compute the absolute difference, and reattach the sign afterwards. But that would be very inefficient. – cmaster - reinstate monica Feb 15 '18 at 12:04
  • @cmaster I was not thinking about the division problem. As I was thinking about array of chars, it is not a problem anyway as `sizeof(char)` is one. – InsideLoop Feb 15 '18 at 12:35
  • @InsideLoop Yes, but `char` is a very special case. And it's a good thing that `charPtrA - charPtrB` yields the same type as `fooPtrA - fooPtrB` in all cases, imho. We have enough special cases in any language, if you ask me, we shouldn't add any unnecessary ones. – cmaster - reinstate monica Feb 15 '18 at 12:50
  • @cmaster Still, if `p1` and `p0` are two pointers of the same type, the difference is a `ptrdiff_t` which is signed. As a consequence indices are naturally signed in `C/C++`. I believe that this choice was made as requesting that `p1` was "larger" than `p0` was not a practical requirement to compute `p1 - p0`. Anyway, my preference of signed over unsigned il mainly related to the "undefined behaviour" of signed overflow which allows many compiler optimizations. – InsideLoop Feb 15 '18 at 12:59

10 Answers10

107

There was a discussion on the C++ Core Guidelines what to use:

https://github.com/isocpp/CppCoreGuidelines/pull/1115

Herb Sutter wrote that gsl::index will be added (in the future maybe std::index), which will be defined as ptrdiff_t.

hsutter commented on 26 Dec 2017 •

(Thanks to many WG21 experts for their comments and feedback into this note.)

Add the following typedef to GSL

namespace gsl { using index = ptrdiff_t; }

and recommend gsl::index for all container indexes/subscripts/sizes.

Rationale

The Guidelines recommend using a signed type for subscripts/indices. See ES.100 through ES.107. C++ already uses signed integers for array subscripts.

We want to be able to teach people to write "new clean modern code" that is simple, natural, warning-free at high warning levels, and doesn’t make us write a "pitfall" footnote about simple code.

If we don’t have a short adoptable word like index that is competitive with int and auto, people will still use int and auto and get their bugs. For example, they will write for(int i=0; i<v.size(); ++i) or for(auto i=0; i<v.size(); ++i) which have 32-bit size bugs on widely used platforms, and for(auto i=v.size()-1; i>=0; ++i) which just doesn't work. I don’t think we can teach for(ptrdiff_t i = ... with a straight face, or that people would accept it.

If we had a saturating arithmetic type, we might use that. Otherwise, the best option is ptrdiff_t which has nearly all the advantages of a saturating arithmetic unsigned type, except only that ptrdiff_t still makes the pervasive loop style for(ptrdiff_t i=0; i<v.size(); ++i) emit signed/unsigned mismatches on i<v.size() (and similarly for i!=v.size()) for today's STL containers. (If a future STL changes its size_type to be signed, even this last drawback goes away.)

However, it would be hopeless (and embarrassing) to try to teach people to routinely write for (ptrdiff_t i = ... ; ... ; ...). (Even the Guidelines currently use it in only one place, and that's a "bad" example that is unrelated to indexing`.)

Therefore we should provide gsl::index (which can later be proposed for consideration as std::index) as a typedef for ptrdiff_t, so we can hopefully (and not embarrassingly) teach people to routinely write for (index i = ... ; ... ; ...).

Why not just tell people to write ptrdiff_t? Because we believe it would be embarrassing to tell people that's what you have to do in C++, and even if we did people won't do it. Writing ptrdiff_t is too ugly and unadoptable compared to auto and int. The point of adding the name index is to make it as easy and attractive as possible to use a correctly sized signed type.

Edit: More rationale from Herb Sutter

Is ptrdiff_t big enough? Yes. Standard containers are already required to have no more elements than can be represented by ptrdiff_t, because subtracting two iterators must fit in a difference_type.

But is ptrdiff_t really big enough, if I have a built-in array of char or byte that is bigger than half the size of the memory address space and so has more elements than can be represented in a ptrdiff_t? Yes. C++ already uses signed integers for array subscripts. So use index as the default option for the vast majority of uses including all built-in arrays. (If you do encounter the extremely rare case of an array, or array-like type, that is bigger than half the address space and whose elements are sizeof(1), and you're careful about avoiding truncation issues, go ahead and use a size_t for indexes into that very special container only. Such beasts are very rare in practice, and when they do arise often won't be indexed directly by user code. For example, they typically arise in a memory manager that takes over system allocation and parcels out individual smaller allocations that its users use, or in an MPEG or similar which provides its own interface; in both cases the size_t should only be needed internally within the memory manager or the MPEG class implementation.)

Robert Andrzejuk
  • 5,076
  • 2
  • 22
  • 31
  • Thanks for the info. I haven't looked at the GSL likely and I have missed that point. – InsideLoop Feb 11 '18 at 20:12
  • 5
    In response to the quoted text; using a signed index has the obvious problem of causing integer overflow when accessing a container whose size exceeds `SIZE_MAX/2`. I hope there are also other changes to address this problem (e.g. making the maximum size of an object actually be `SIZE_MAX/2` instead of `SIZE_MAX`). – M.M Feb 11 '18 at 20:39
  • @M.M `for(auto i=v.size()-1; i!=(~(size_t)(0)); --i)` is ugly, but mostly works. The problem with `ptrdiff_t` is that ideally you want it to be 1 bit wider than `size_t`, but that's impossible, so it reaches only half of the address space. If you halve `SIZE_MAX` so that it is covered by an `int`, then there is no much point of having unsigned `size_t` and signed `ptrdiff_t` at the same time. – Joker_vD Feb 12 '18 at 00:18
  • @ Joker_vD : You don't have to make it that ugly. The proper form is `for (auto i = v.size() - 1; i != -1; --i)`. However, the alternative (and arguably preferrable) idiom is `for (auto i = v.size(); i-- > 0; )` – AnT stands with Russia Feb 12 '18 at 02:15
  • 12
    Defining `gsl::index` as `ptrdiff_t` will make it an instant anti-pattern in contexts where negative indexing is inappropriate. It will implicitly label any code that'd use `gsl::index` in such contexts as "garbage quality code". The appropriate type is indeed `std::size_t` and without any doubt is has to be *unsigned*. Everybody understands that *signed* indexing is necessary in some contexts, but making people use signed indexing *by default* is not an option. The need for different types in different contexts is exactly why we don't have a "default" index type. Such type does not exist. – AnT stands with Russia Feb 12 '18 at 02:20
  • 3
    I don't understand how using `ptrdiff_t` is *embarrasing*. It's fine if Sutter doesn't like it, but it's not a good argument against it. – user694733 Feb 12 '18 at 08:45
  • @user694733 you'll see that a lot of people already struggle with using vectors and smart pointers over C-style arrays and raw owning pointers. I think he's coming from there. – Quentin Feb 12 '18 at 09:36
  • 1
    @AnT Actually not "Everybody understands". This is why a default is needed. Experts with experience don't need to use this. But until everybody in their programing environment understands and is ready to progress to the "next" level, the defaults are needed to guide them. And as is mentioned (on unsigned integers in the STL) in the interview linked in the question - Sutter: "They are wrong." Chandler: "We're Sorry." Sutter: "As Scott (Meyers) would say - we were young." ;-) – Robert Andrzejuk Feb 12 '18 at 09:48
  • 2
    "Because we believe it would be embarrassing to tell people that's what you have to do in C++" -- Indeed, C++ is embarassing enough as it is. Not a sound reason though. I agree with @user694733. – alecov Feb 12 '18 at 16:46
  • @M.M when will a container size exceed MAX_SIZE/2? Answer when the sizeof the stored object is 1. Basically a char. For everything else this advice is sound. – Robert Andrzejuk Feb 12 '18 at 17:03
  • @AnT Even better: `for(auto i = v.size(); i--; )` Works correctly without warning with any integer type, signed and unsigned types included. – cmaster - reinstate monica Feb 13 '18 at 14:22
  • 1
    @user694733 You really can tell people with a straight face "oh C++ is perfectly simple. See a simple for loop just requires us to use this 15 character long monstrosity, which from the sounds of it actually should just deal with pointers"? (I'm just going to discount the whole "oh we don't count upwards in C++ because it's just too complicated with the type system, simply count towards 0 [but careful if you don't do it exactly the right way you'll have a bug]" because I think we can all agree that's absolutely insane). – Voo Feb 13 '18 at 18:06
  • @Voo To clarify; I don't have strong opinions on what type to use for indexing. I just think saying something is embarrassing is poor communication, because it doesn't explain what is wrong. And I would never say "C++ is simple", because that would be lying. I don't dislike C++, but it simply has too much quirks to ever consider it simple. – user694733 Feb 14 '18 at 08:11
38

I come at this from the perspective of an old timer (pre C++)... It was understood back in the day that int was the native word of the platform and was likely to give the best performance.

If you needed something bigger, then you'd use it and pay the price in performance. If you needed something smaller (limited memory, or specific need for a fixed size), same thing.. otherwise use int. And yeah, if your value was in the range where int on one target platform could accommodate it and int on another target platform could not.. then we had our compile time size specific defines (prior to them becoming standardized we made our own).

But now, present day, processors and compilers are much more sophisticated and these rules don't apply so easily. It is also harder to predict what the performance impact of your choice will be on some unknown future platform or compiler ... How do we really know that uint64_t for example will perform better or worse than uint32_t on any particular future target? Unless you're a processor/compiler guru, you don't...

So... maybe it's old fashioned, but unless I am writing code for a constrained environment like Arduino, etc. I still use int for general purpose values that I know will be within int size on all reasonable targets for the application I am writing. And the compiler takes it from there... These days that generally means 32 bits signed. Even if one assumes that 16 bits is the minimum integer size, it covers most use cases.. and the use cases for numbers larger than that are easily identified and handled with appropriate types.

little_birdie
  • 5,600
  • 3
  • 23
  • 28
  • 10
    More to the point it's just not true any more than `int` is the native word of the platform. As 64-bit has become the norm, legacy considerations has seen `int` get left behind. Of course this is why types like `size_t` exist in the first place - because your system knows best. – Lightness Races in Orbit Feb 11 '18 at 23:24
  • I find this approach to coding disturbing. "It maybe works and I am used to it so I'll use it". You probably don't see arrays of 2*31 + 1 elements very often, but what if you do, with 32Gb RAM or more it's easy. Also the code using int is underspecified. By using it you say "I know all platforms on which this code will ever be run and bitness of int on those platforms and it fits the use". IMO this line of thought is the reason behind portability problems for some major projects. – Uprooted Feb 12 '18 at 11:13
  • 10
    Presumably an array of 2*31+1 elements doesn't just randomly APPEAR in ones code one day without ones knowledge. It's a deliberate design decision in which case use an adequate type. Do you also explicitly type index values to unint8_t if it's a small array? – little_birdie Feb 12 '18 at 11:32
  • 4
    Sometimes large arrays do *just appear* in code that previously operated on small arrays. It can happen when you buy a new camera, for instance - do you then inspect all your source code to inspect every allocation of `width*height*bpp`? – Toby Speight Feb 12 '18 at 13:33
  • @little_birdie, if i operate on array of say 256 bytes, i still use full pointer-sized index. I guard array boundaries by assertions if appropriate, using uin8_t is just inviting broken loop conditions. What i say is that you rarely know possible future uses of your code, so you can't be sure today if using int instead of ptrdiff_t is negligible flaw or not. I prefer to avoid even potential flaw here by using the most correct type. This is not a huge deal ofc, but if you can do 'right' for about the same price as 'almost right', why go with the second? – Uprooted Feb 12 '18 at 14:22
  • 1
    @Bilkokuya Yes, I agree.. I'll improve my answer to be more explicit on this point. Actually I don't presume to know the size of int on absolutely every platform. The fact remains that the use cases for large integer values.. at least in my code.. are quite specific and easily identified when I am coding them. If one prefers to take the pendantic approach and go for the maximum level of explicit typing for a simple loop counter where yea, it's usually going to be 16 bits or less.. I wouldn't say no don't do that.. I just don't consider it to be a clear win for performance. – little_birdie Feb 12 '18 at 17:53
  • 3
    The whole point of the `int_fastN_t` types from stdint.h is “to give the best performance” given a minimum size constraint. Though, the standard is vague about exactly *which* integer operations need to be made fast. – dan04 Feb 12 '18 at 17:54
  • @TobySpeight in such a case you obviously either have input validation to signal unsupported dimension AND you obviously decided what size you wanted to be able to support at the time of writing. At no time does it accidentally happen, unless you believe in accidental design (I call that a bug) – sehe Feb 13 '18 at 18:48
  • 1
    @dan04 "Though, the standard is vague about exactly which integer operations need to be made fast." and there isn't really a "fastest" integer type on current CPUs, They have instructions for both 64-bit and 32-bit operations. 64-bit is perhaps more efficient for indexing operations but 32-bit causes less cache pressure. – plugwash Apr 05 '23 at 21:06
  • @plugwash: Yeah, it seems to be working under the assumption of a pure N-bit processor with all N-bit registers, N-bit ALU, N-bit pointers, and N-bit data bus. But what do you do when those things are *not* all the same size? – dan04 Apr 10 '23 at 22:17
20

Most programs do not live and die on the edge of a few CPU cycles, and int is very easy to write. However, if you are performance-sensitive, I suggest using the fixed-width integer types defined in <cstdint>, such as int32_t or uint64_t. These have the benefit of being very clear in their intended behavior in regards to being signed or unsigned, as well as their size in memory. This header also includes the fast variants such as int_fast32_t, which are at least the stated size, but might be more, if it helps performance.

Sae1962
  • 1,122
  • 15
  • 31
Eyal K.
  • 1,042
  • 8
  • 23
  • 10
    Using fixed 64-bit integer types for indices kills performance on 32-bit systems. `size_t` or `ptrdiff_t` are much better in this regard. – nwellnhof Feb 11 '18 at 13:00
  • 2
    Also, the exact-width types are optional and need not exist, so at least purely from the perspective of keeping your code theoretically portable, I don't think it's good to use them without a specific fixed-width requirement. – underscore_d Feb 11 '18 at 13:17
  • 3
    You're mixing up code size and efficiency. Fixed-size types like `int32_t` are in general, significantly *less* efficient, because by forcing an exact size, they constrain the compiler to not use the size that might be more natural or computationally efficient. – Steve Summit Feb 11 '18 at 16:54
  • @nwellnhof: 64 bit indices are faster than signed 32 bit indices (which is what int would be) unless the compiler can prove the index is never negative. – Joshua Feb 11 '18 at 20:24
  • 4
    @Joshua: I make a living in doing High Performance Computing, and I can tell you than `nwellnhof` is right: you don't want to use integers whose size is above the word size of the computers unless you really need such a range. It just kills performance. – InsideLoop Feb 11 '18 at 21:59
  • 7
    @Joshua: How can 64-bit anything be faster than 32-bit anything on a 32-bit system? What are _you_ smoking?! – Lightness Races in Orbit Feb 11 '18 at 23:25
  • @LightnessRacesinOrbit I can only think of possibly cases where storing the result of a 32 bit computation in a 64 bit int lets the compiler avoid otherwise automatically inserted overflow checking. Since C doesn't do that... – Dan Is Fiddling By Firelight Feb 12 '18 at 18:20
  • @DanNeely: "Overflow checking" isn't really a thing. Perfect example: https://stackoverflow.com/q/48731306/560648 – Lightness Races in Orbit Feb 12 '18 at 18:44
16

No formal reason to use int. It doesn't correspond to anything sane as per standard. For indices you almost always want signed pointer-sized integer.

That said, typing int feels like you just said hey to Ritchie and typing std::ptrdiff_t feels like Stroustrup just kicked you in the butt. Coders are people too, don't bring too much ugliness into their life. I would prefer to use long or some easily typed typedef like index instead of std::ptrdiff_t.

Uprooted
  • 941
  • 8
  • 21
  • 2
    Members of the C++ committee agree that ptrdiff_t is ugly. So they propose "index" to be used. – Robert Andrzejuk Feb 11 '18 at 11:04
  • Either I did not read your answer in whole or you extended it, I guess it covers question fully now. I always thought people underestimate the need for elegance in coding, but apparently committee doesn't. – Uprooted Feb 11 '18 at 12:00
  • 4
    "For indices you almost always want signed machine-word-long type." Which matches the definition of "int"! – Sjoerd Feb 11 '18 at 19:20
  • 5
    @Sjoerd `int` is signed but is 32 bits on most 64-bit platform. – InsideLoop Feb 11 '18 at 20:13
  • 9
    @InsideLoop Sadly, you are correct, but that is due to poor judgment on the part of the tool vendors. On an N-bit platform, `int` should be N-bit **by its definition**. What we need is something that really means what `int` used to mean (before 64-bit tool vendors broke it). – Jon Kalb Feb 11 '18 at 22:06
  • @JonKalb so, `intptr_t` or `ptrdiff_t`. – CAD97 Feb 12 '18 at 02:21
  • @JonKalb I actually prefer LP64 to ILP64, because `signed char`, `short`, `int`, and `long` are integers of different sizes. Of course, in production code I'd prefer `int8_t`, `int16_t`, etc., but these are _typedefs_, so you have to have _some_ type of the right size. In ILP64, I don't see a nice way to get both 16- and 32-bit types, unless they add a `short short` type. I'm also curious as to whether you think m68k's `int` should be a 16- or 32-bit type – Fox Feb 12 '18 at 14:51
  • 2
    YMMD with that line about Ritchie and Stroustrup :-) – cmaster - reinstate monica Feb 13 '18 at 14:26
  • @Sjoerd, didn't know, removed my comment as misleading. – Uprooted Feb 13 '18 at 17:08
12

This is somewhat opinion-based, but alas, the question somewhat begs for it, too.

First of all, you talk about integers and indices as if they were the same thing, which is not the case. For any such thing as "integer of sorts, not sure what size", simply using int is of course, most of the time, still appropriate. This works fine most of the time, for most applications, and the compiler is comfortable with it. As a default, that's fine.

For array indices, it's a different story.

There is to date one single formally correct thing, and that's std::size_t. In the future, there may be a std::index_t which makes the intent clearer on the source level, but so far there is not.
std::ptrdiff_t as an index "works" but is just as incorrect as int since it allows for negative indices.
Yes, this happens what Mr. Sutter deems correct, but I beg to differ. Yes, on an assembly language instruction level, this is supported just fine, but I still object. The standard says:

8.3.4/6: E1[E2] is identical to *((E1)+(E2)) [...] Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1.
5.7/5: [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object [...] otherwise, the behavior is undefined.

An array subscription refers to the E2-th member of E1. There is no such thing as a negative-th element of an array. But more importantly, the pointer arithmetic with a negative additive expression invokes undefined behavior.

In other words: signed indices of whatever size are a wrong choice. Indices are unsigned. Yes, signed indices work, but they're still wrong.

Now, although size_t is by definition the correct choice (an unsigned integer type that is large enough to contain the size of any object), it may be debatable whether it is truly good choice for the average case, or as a default.

Be honest, when was the last time you created an array with 1019 elements?

I am personally using unsigned int as a default because the 4 billion elements that this allows for is way enough for (almost) every application, and it already pushes the average user's computer rather close to its limit (if merely subscribing an array of integers, that assumes 16GB of contiguous memory allocated). I personally deem defaulting to 64-bit indices as ridiculous.

If you are programming a relational database or a filesystem, then yes, you will need 64-bit indices. But for the average "normal" program, 32-bit indices are just good enough, and they only consume half as much storage.

When keeping around considerably more than a handful of indices, and if I can afford (because arrays are not larger than 64k elements), I even go down to uint16_t. No, I'm not joking there.

Is storage really such a problem? It's ridiculous to greed about two or four bytes saved, isn't it! Well, no...

Size can be a problem for pointers, so sure enough it can be for indices as well. The x32 ABI does not exist for no reason. You will not notice the overhead of needlessly large indices if you have only a handful of them in total (just like pointers, they will be in registers anyway, nobody will notice whether they're 4 or 8 bytes in size).

But think for example of a slot map where you store an index for every element (depending on the implementation, two indices per element). Oh heck, it sure does make a bummer of a difference whether you hit L2 every time, or whether you have a cache miss on every access! Bigger is not always better.

At the end of the day, you must ask yourself what you pay for, and what you get in return. With that in mind, my style recommendation would be:

If it costs you "nothing" because you only have e.g. one pointer and a few indices to keep around, then just use what's formally correct (that'd be size_t). Formally correct is good, correct always works, it's readable and intellegible, and correct is... never wrong.

If, however, it does cost you (you have maybe several hundred or thousand or ten thousand indices), and what you get back is worth nothing (because e.g. you cannot even store 220 elements, so whether you could subscribe 232 or 264 makes no difference), you should think twice about being too wasteful.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • 3
    I am sorry to disagree with you on the fact that indices should be unsigned integers. As many people in the C++ community, you are wrong. Not only Herb Sutter but also Bjarne Stroustrup and Chandler Carruth agree on that point and believe that the STL made the wrong choice. Yes indices are nonnegative integers. So what? Integer division by 0 implies undefined behavior and nobody felt the need to create a type that does not contain 0. Besides that, if p is a pointer and q = p + n, it turns out that q - p is, by the standard, a `ptrdiff_t`. It sounds natural for `n` to be that type. – InsideLoop Feb 14 '18 at 06:29
11

On most modern 64-bit architectures, int is 4 bytes and ptrdiff_t is 8 bytes. If your program uses a lot of integers, using ptrdiff_t instead of int could double your program's memory requirement.

Also consider that modern CPUs are frequently bottlenecked by memory performance. Using 8-byte integers also means your CPU cache now has half as many elements as before, so now it must wait for the slow main memory more often (which can easily take several hundred cycles).

In many cases, the cost of executing "32-to-64-bit conversion" operations is completely dwarfed by memory performance.

So this is a practical reason int is still popular on 64-bit machines.

  • Now you may argue about two dozen different integer types and portability and standard committees and everything, but the truth is that for a lot of C++ programs written out there, there's a "canonical" architecture they're thinking of, which is frequently the only architecture they're ever concerned about. (If you're writing a 3D graphics routine for a Windows game, you're sure it won't run on an IBM mainframe.) So for them, the question boils down to: "Do I need a 4-byte integer or an 8-byte one here?"
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jick
  • 409
  • 5
  • 12
  • Would that logic extend to preferring to use `uint16_t` if you know that the container will never exceed the limits of that type, etc. ? – M.M Feb 11 '18 at 20:20
  • 1
    Yes, but of course it's more risky: it's very easy to "accidentally" have a container of 32K elements, for example, while something must go seriously wrong to have a vector of 2G elements in many cases. – jick Feb 11 '18 at 20:25
  • 1
    It turns out that I am quite familiar with high performance computing and I have to disagree with you. When your integer is in the register, it will take one register, no matter what size it is. In case you have arrays of integers where memory bandwidth is important and you need to put as much as possible integers into a vector register, there is no reason to choose a 32-bit (if it is the case) integer such as int. For those reason, you need to use the smallest integer size that can handle your range of integers. The same thing applies for struct when you want to fit as much on a cacheline. – InsideLoop Feb 11 '18 at 20:26
  • @InsideLoop I'm not quite sure what you're disagreeing on. When you need as much integers as possible in the same amount of RAM, you pick the smallest integer that can do the job, which is most often `int`. (Of course sometimes you can do better with `short`, `char`, etc., but you hit diminishing returns.) – jick Feb 11 '18 at 21:14
  • @jick On need to have a real example to comment on that. But, when you have arrays of integer, you really want to use the smallest integer. Everybody agrees on that point. For instance, to store a 8-bit image, you use an array of `std::uint8_t`. In a struct, that you intend to use in an array, size also matters. Other than those cases, I don't see a single example when using `int` over `std::ptrdiff_t` offers a performance advantage. My point is that `int` is only used for historical reason and because its name is simple. But if we abstract the name, `std::ptrdiff_t` clearly wins. – InsideLoop Feb 11 '18 at 21:54
  • Sorry, I can't follow you. You're saying that size matters, but then you're saying `int` does not offer performance advantage over `ptrdiff_t`? Isn't that a contradiction? (Also please keep in mind that practically everything in a typical C++ program is stored in an array or a vector or an `unordered_map` or a protocol buffer or some kind of container. When your web browser renders this page, it must keep a gazillion instances of the same classes: characters, HTML tags, text boxes, etc. Memory requirements for these little things add up.) – jick Feb 11 '18 at 22:03
  • @jick: There are cases where size does matter, for instance when you are limited by bandwidth or when you use vector instructions. In this case, I don't see any reason why `int` (which is usually 32-bit) should be the good choice. One can even go to smaller types, it could be 8-bit, 16-bit or 32-bit. In that case, I believe that the right type to use is a fixed size integer. But there are many cases where size does not matter. In that case, I feel that the right size for your integer is the register of your CPU, which happens to be the size of `ptrdiff_t` on every common architecture. – InsideLoop Feb 14 '18 at 06:34
5

My advice to you is not to look at assembly language output too much, not to worry too much about exactly what size each variable is, and not to say things like "the compiler feels at home with". (I truly don't know what you mean by that last one.)

For garden-variety integers, the ones that most programs are full of, plain int is supposed to be a good type to use. It's supposed to be the natural word size of the machine. It's supposed to be efficient to use, neither wasting unnecessary memory nor inducing lots of extra conversions when moving between memory and computation registers.

Now, it's true that there are plenty of more specialized uses for which plain int is no longer appropriate. In particular, sizes of objects, counts of elements, and indices into arrays are almost always size_t. But that doesn't mean all integers should be size_t!

It's also true that mixtures of signed and unsigned types, and mixtures of different-size types, can cause problems. But most of those are well taken care of by modern compilers and the warnings they emit for unsafe combinations. So as long as you're using a modern compiler and paying attention to its warnings, you don't need to pick an unnatural type just to try to avoid type mismatch problems.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • 1
    `int` was originally supposed to be the natural word size of the machine; however that is no longer true for most x64 impelementations – M.M Feb 11 '18 at 20:22
  • @M.M: I recall something about older 64 bit machines having sizeof(int) == 8 and int is still the natural machine integer size in x64 (read the disassembly; it prefers 32 bit registers). – Joshua Feb 11 '18 at 20:27
  • @M.M If the natural word size of a particular machine is 64 bits, then indeed `int` should arguably be 64 bits on that machine, and I have no objection to that -- but that's still not a reason *not* to use type `int` in code! But is 64 bits truly the "natural word size" of x86_64? I honestly don't know, although I get the impression that the 64 bits are much more for addresses, not so much for ordinary integral calculations. (But I could be wrong; as I say I don't know.) – Steve Summit Feb 12 '18 at 03:14
4

I don't think that there's real reason for using int.

How to choose the integer type?

  • If it is for bit operations, you can use an unsigned type, otherwise use a signed one
  • If it is for memory-related thing (index, container size, etc.), for which you don't know the upper bound, use std::ptrdiff_t (the only problem is when size is larger than PTRDIFF_MAX, which is rare in practice)
  • Otherwise use intXX_t or int(_least)/(_fast)XX_t.

These rules cover all the possible usages for int, and they give a better solution:

  • int is not good for storing memory related things, as its range can be smaller than an index can be (this is not a theoretical thing: for 64-bit machines, int is usually 32-bit, so with int, you can only handle 2 billion elements)
  • int is not good for storing "general" integers, as its range may be smaller than needed (undefined behavior happens if range is not enough), or on the contrary, its range may be much larger than needed (so memory is wasted)

The only reason one could use an int, if one does a calculation, and knows that the range fit into [-32767;32767] (the standard only guarantees this range. Note however, that implementations are free to provide bigger sized ints, and they usually do so. Currently int is 32-bit on a lot of platforms).

As the mentioned std types are a little bit tedious to write, one could typedef them to be shorter (I use s8/u8/.../s64/u64, and spt/upt ("(un)signed pointer sized type") for ptrdiff_t/size_t. I've been using these typedefs for 15 years, and I've never written a single int since...).

geza
  • 28,403
  • 6
  • 61
  • 135
  • "standard only gaurentees ..." is not true. The standard says: There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”.In this list, each type provides at least as much storage as those preceding it in the list. – Robert Andrzejuk Feb 11 '18 at 16:09
  • And the constraints given in the C standard, subclause 5.2.4.2.1. says that "implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown ..." – Robert Andrzejuk Feb 11 '18 at 16:17
  • @RobertAndrzejuk : The standard doesn't say `int` cannot hold 32768, but if one is writing maximally portable code, one must not assume that it can either. (I have given up writing code which will work with a 16-bit `int` though.) – Martin Bonner supports Monica Feb 11 '18 at 16:25
  • @MartinBonner Which values '''int''' can store is implementation defined. So instead of "16 bit(... " please say that. And just like in your comment, it is another reason not to use it. – Robert Andrzejuk Feb 11 '18 at 16:50
  • 1
    Ptrdiff_t doesn't have ANY guaranteed minimum, so in that respect it is no better. – Martin Bonner supports Monica Feb 11 '18 at 17:11
  • 2
    @RobertAndrzejuk: that's a different thing. I was talking about `int`. C++ standard: "The signed and unsigned integer types shall satisfy the constraints given in the C standard, subclause 5.2.4.2.1.". And if you look at the C standard, you'll find that INT_MIN/MAX should be at least -32767/32767. That's the guaranteed range of `int`. Of course, there are other types with other range requirements, but this question was about `int`. – geza Feb 11 '18 at 18:10
  • @RobertAndrzejuk: I agree with your comment about 16 bit, I've removed it (But practically that -32767;32767 limit means it is 16 bit...) – geza Feb 11 '18 at 18:14
  • @geza So we agree on what the standard says. My problem with that part of the answer is that it implies that you can only rely on that range when using an int. This is not true. The range depends on the compiler implementation/s used. – Robert Andrzejuk Feb 11 '18 at 20:47
  • @RobertAndrzejuk: Sure (I've edited my answer to be clearer on this part). If portability is not a main concern (or the code is targeted only at 32-bit `int` platforms), one might ignore that `int` could be 16-bit. I suppose that a lot of programs would be broken if suddenly `int` become 16-bit. But the possibility is there, and it takes virtually nothing to use proper types. It's just a habit. – geza Feb 11 '18 at 21:11
  • There is another reason to use `int` that is not listed here. If you want to use the natural word size of the machine. That is what `int` was defined to mean. Alas, this have been broken by poor 64-bit implementations. – Jon Kalb Feb 11 '18 at 22:12
  • @JonKalb: It is not listed, because it is not true, for the very reason you mention: 64-bit platforms. All the implementations I know use 32-bit `int` (I wouldn't call them poor). – geza Feb 11 '18 at 22:32
2

Pro

Easier to type, I guess? But you can always typedef.

Many APIs use int, including parts of the standard library. This has historically caused problems, for example during the transition to 64-bit file sizes.

Because of the default type promotion rules, types narrower than int could be widened to int or unsigned int unless you add explicit casts in a lot of places, and a lot of different types could be narrower than int on some implementation somewhere. So, if you care about portability, it’s a minor headache.

Con

I also use ptrdiff_t for indices, most of the time. (I agree with Google that unsigned indices are a bug attractor.) For other kinds of math, there’s int_fast64_t. int_fast32_t, and so on, which will also be as good as or better than int. Almost no real-world systems, with the exception of a few defunct Unices from last century, use ILP64, but there are plenty of CPUs where you would want 64-bit math. And a compiler is technically allowed, by standard, to break your program if your int is greater than 32,767.

That said, any C compiler worth its salt will be tested on a lot of code that adds an int to a pointer within an inner loop. So it can’t do anything too dumb. Worst-case scenario on present-day hardware is that it needs an extra instruction to sign-extend a 32-bit signed value to 64 bits. But, if what you really want is the fastest pointer math, the fastest math for values with magnitude between 32 kibi and 2 gibi, or the least wasted memoey, you should say what you mean, not make the compiler guess.

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • I'm skeptical about `int_fast32_t` etc. It seems to me different situations will involve different relative timings of the types. – M.M Feb 11 '18 at 20:22
  • @M.M The main one I can think of is that a smaller type would fit more elements into the cache, so you should use `int_least32_t` in some circumstances to get fewer cache misses. (Part of what I was getting at wit “least wasted memory.”) However, if one size does not fit all, `int` cannot fit all either and is no improvement. And you want to be able to specify, lowest size. – Davislor Feb 11 '18 at 20:23
  • Oh certainly, it's the `fast` variants I'm skeptical of. `int_least32_t` has a different purpose, namely to make the code portable to implementations that don't define a 32-bit type. – M.M Feb 11 '18 at 20:35
  • Still, `int_fast32_t` is the library programmers’ best guess what the fastest type usually is, while many compiler vendors are forced to keep `int` at 32 bits for compatibility. – Davislor Feb 11 '18 at 20:42
2

I guess 99% of cases there is no reason to use int(or signed integer of other sizes). However, there are still situations, when using int is a good option.


A) Performance:

One difference between int and size_t is that i++ can be undefined behavior for int - if i is MAX_INT. This actually might be a good thing because compiler could use this undefined behavior to speed things up.

For example in this question the difference was about factor 2 between exploiting the undefined behavior and using compiler flag -fwrapv which prohibits this exploit.

If my working-horse-for-loop becomes twice as fast by using ints - sure I will use it


B) Less error prone code

Reversed for-loops with size_t look strange and is a source for errors (I hope I got it right):

for(size_t i = N-1; i < N; i--){...}

By using

for(int i = N-1; i >= 0; i--){...}

you will deserve the gratitude of less experienced C++-programmers, who will have to manage your code some day.


C) Design using signed indices

By using int as indices you one could signal wrong values/out of range with negative values, something that comes handy and can lead to a clearer code.

  1. "find index of an element in array" could return -1 if element is not present. For detecting this "error" you don't have to know the size of the array.

  2. binary search could return positive index if element is in the array, and -index for the position where the element would be inserted into array (and is not in the array).

Clearly, the same information could be encoded with positive index-values, but the code becomes somewhat less intuitive.


Clearly, there are also reasons to choose int over std::ptrdiff_t - one of them is memory bandwidth. There are a lot of memory-bound algorithms, for them it is important to reduce the amount of memory transfered from RAM to cache.

If you know, that all numbers are less then 2^31 that would be an advantage to use int because otherwise a half of memory transfer would be writing only 0 of which you already know, that they are there.

An example are compressed sparse row (crs) matrices - their indices are stored as ints and not long long. Because many operations with sparse matrices are memory bound, there is really a different between using 32 or 64 bits.

ead
  • 32,758
  • 6
  • 90
  • 153
  • I completely agree with all your points. Which is the reason why I favor `std::ptrdiff_t` over `std::size_t`. If we use `std::ptrdiff_t`, we get all the benefits of `int` and a wider range. – InsideLoop Feb 14 '18 at 06:21
  • @InsideLoop For memory bound operations there is a difference whether you have to tranfser 32bit or 64bit per element. So for this scenario `int` would be a better choice. – ead Feb 17 '18 at 19:17
  • I agree with you that when its memory bound or even compute bound and you can use vectorization, the smaller the better. But in this case, I don't think that `int` should be the right choice. I believe that in this case, you should use `std::uint8_t`, `std::uint16_t` or `std::int32_t`. – InsideLoop Feb 18 '18 at 18:23