16

I'm currently in the process of converting some uses of unsigned int to size_t in my a code base that I have been developing over the years. I understand the difference between the two and that for example unsigned int could be 32-bit while pointers and size_t could be 64-bit. My question is more about where I should use either one and what kind of convention people use for picking between the two.

It's quite clear that memory allocation should take size_t instead of unsigned int as an argument, or that container classes should use size_t for size and indexing like in STL. These are the common cases referred when reading about the benefits of size_t vs unsigned int. However, while working on the code base conversion I stumbled upon quite a few cases in gray areas where I'm not sure which one to use. For example if 4x4 matrix row/column index should be size_t for consistency regardless the index being in range [0, 3], or if screen/texture resolution should use size_t despite of being in range of few thousand, or in general if the reasonable number of objects is expected to be in the range of tens I should still use size_t for consistency.

What kind of coding conventions you use for picking between unsigned int and size_t? Should everything that's representing size (in bytes or objects), or index be always size_t regardless of the reasonably expected range? Is there some widely accepted size_t convention used in well-established libraries that I could follow?

JarkkoL
  • 1,898
  • 11
  • 17

3 Answers3

9

I think it's simple, although I welcome the slings and arrows.

size_t should be used if it describes something that has a size. (A count. A number of things)

Taryn
  • 242,637
  • 56
  • 362
  • 405
Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
  • 1
    Agreed, typedefs make code easy to read, and adds meta info to the code, if used wisely. – Aniket Inge Jun 15 '14 at 04:14
  • So would you also use it for index/offset? How about resolution? I guess resolution would be number of pixels/texels in each dimension. – JarkkoL Jun 15 '14 at 04:23
  • @JarkkoL those are great questions too. If they can't be negative, my answer is _yes_. `size_t` is for anything that can be _counted_. – Drew Dormann Jun 15 '14 at 04:24
  • 3
    Personally, I don't think that *"it can't be negative"* is sufficient reason for making a variable unsigned. Because even variables which should never have a negative value are often involved in calculations which require taking negative numbers into consideration. – Benjamin Lindley Jun 15 '14 at 04:31
  • @BenjaminLindley I can certainly relate to what you're describing. I think there's a distinction between "it shouldn't be negative" and "it can't be negative". The elements in your `vector` or the bytes you've allocated, for example, would be a `size_t`. Meanwhile, I'm happy to upvote a counter-argument. – Drew Dormann Jun 15 '14 at 04:35
  • @DrewDormann: No, I would disagree with that too. The examples you mentioned are exactly the sort of variables I was referring to which are often involved in calculations which could result in negative values. – Benjamin Lindley Jun 15 '14 at 04:37
  • 1
    @BenjaminLindley That's a cool observation - I can see that. I've been tempted to also join the "just make everything signed" club, since we've left 8 bit. But I'm faster to follow the standards folks, as I could never teach them anything. – Drew Dormann Jun 15 '14 at 04:40
  • I wonder though if generally following this convention would have some wide spread performance implications over the entire code base. You are essentially doubling the amount of data being passed around for this type of data vs using `unsigned int`. Also for using `size_t` in loops for counters might have a big performance impact on tight loops I suppose. – JarkkoL Jun 15 '14 at 04:46
  • 2
    @DrewDormann: I think you'll find many of the standard's folks are in agreement, and see the prevalence of unsigned values in the standard library as a mistake, but it's too hard to correct it now. They were talking about it in a panel discussion at a conference, (I believe it was Microsoft's Going Native. I'll try to find it), and they were in agreement for the most part on it. I remember Chandler Carruth and Bjarne Stroustrup specifically talking about it. – Benjamin Lindley Jun 15 '14 at 04:48
  • 1
    No, you shouldn't delete or change your answer. I wasn't even responding to your answer, just your comment about *"If they can't be negative, my answer is yes."* -- Actually, I haven't even read the question completely, but it seems to be about choosing between choosing between `unsigned int` and `size_t`, in which case, your answer is probably correct. – Benjamin Lindley Jun 15 '14 at 04:51
  • @JarkkoL that's probably a new question - whether `size_t` performs poorer than other types on various platforms. In my experience, `size_t` is the "largest, fast" size. – Drew Dormann Jun 15 '14 at 04:52
  • 1
    Here's that panel video I was referring to. http://channel9.msdn.com/Events/GoingNative/2013/Interactive-Panel-Ask-Us-Anything -- I'm not sure at what point they're talking about it, but I know it's this video because of the comments underneath. – Benjamin Lindley Jun 15 '14 at 05:04
  • @JarkkoL, all, thanks for the enlightening conversation. There are obviously many philosophies represented; it will be interesting to continue as a fly on the wall as the conversation progresses. My curiosity is heightened when the comparison is between `size_t` and `unsinged int`; when in my mind there should be a comparison between `size_t :: ssize_t` vs `unsigned int :: signed int`. I agree that there are times when size_t is excessive, when unsigned char would seem to be sufficient. – Mahonri Moriancumer Jun 15 '14 at 05:04
  • @DrewDormann I was thinking more in the context of figuring out what the convention should be and if performance considerations are something to take into account. – JarkkoL Jun 15 '14 at 05:11
  • 1
    @JarkkoL on modern systems, both are good and `size_t` is chosen to be performant. That is, `size_t` will be a good size and correct for that purpose. – Drew Dormann Jun 15 '14 at 05:13
  • @DrewDormann Yes, I think on modern 64-bit systems both 32- and 64-bit operations are generally equally performant. The difference is mainly in the cache pressure due to extra storage, but it may not be something to really worry about in this case. – JarkkoL Jun 15 '14 at 05:17
  • 1
    @BenjaminLindley it seems that in your provided video your point runs from **0:09:50-13:10**. Thanks for the video. – Drew Dormann Jun 15 '14 at 05:59
  • @DrewDormann And if you watch it some more, Andrei speaks out against the *everything should be signed* argument. I think his comment is something like - *you'll have to pry my unsigned ints out of my cold, dead hands*. I'm with you on this one, if math on an unsigned variable could possibly make it negative, that's a special case that needs special consideration. +1 – Praetorian Jun 15 '14 at 06:08
  • @Praetorian At a different point Chandler says that you should only use unsigned if you want mod 2^n arithmetic, as if that is something rare or strange. Whereas I think you could make a case that is what people want a lot of the time whether they know it or not. – Tim Seguine Jun 15 '14 at 10:23
4

With a 32- to 64-bit port of some legacy code recently on my mind, the key characteristic of size_t in my mind is that it is always big enough to represent your whole address space.

Any other type you can name (including unsigned long) has the potential to put an artificial limit on your data structures at some point in the future. size_t (and its cousin ptrdiff_t) should be the default basis for data structure construction when you can't define a hard a priori upper bound on the domain.

Drew Hall
  • 28,429
  • 12
  • 61
  • 81
  • 1
    Not true. It is big enough to handle any single object. There have been systems where the address space was substantially bigger than the largest possible object (when address space of 640KB was big enough for everyone but objects could only by 64KB) – gnasher729 Apr 18 '15 at 17:51
  • @gnasher729: Ok, fair point. But I don't know many systems still using a segmented memory model these days. – Drew Hall Apr 19 '15 at 05:11
1

To me, the question whether you use an integer that is smaller than the architectural width, is the question whether you can prove that smaller size to be sufficient.

Take for example your 4x4 Matrix: Is there a theoretical reason why it must be 4x4 and not, say 5x5 or 8x8? If there is such a theoretical reason, I have no problem with a smaller integer type. If there is none, use size_t or another type that's at least as wide.

My reasoning is that fixed limits (and fixed integer sizes are just one way to introduce those) are generally sleeping bugs. Someone, someday will probably find some extreme use-case where the assumptions you made to fix the limit don't hold. So you want to avoid them wherever they might crop up. And since I generally don't bother to do a proof for a smaller size (because it's pointless regarding performance), I usually end up using full size integers.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106