270

The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type. I've read on some sites that I found on the Googles that this is legal and/or should always work:

void *v = malloc(10);
size_t s = (size_t) v;

So then in C99, the standard introduced the intptr_t and uintptr_t types, which are signed and unsigned types guaranteed to be able to hold pointers:

uintptr_t p = (size_t) v;

So what is the difference between using size_t and uintptr_t? Both are unsigned, and both should be able to hold any pointer type, so they seem functionally identical. Is there any real compelling reason to use uintptr_t (or better yet, a void *) rather than a size_t, other than clarity? In an opaque structure, where the field will be handled only by internal functions, is there any reason not to do this?

By the same token, ptrdiff_t has been a signed type capable of holding pointer differences, and therefore capable of holding most any pointer, so how is it distinct from intptr_t?

Aren't all of these types basically serving trivially different versions of the same function? If not, why? What can't I do with one of them that I can't do with another? If so, why did C99 add two essentially superfluous types to the language?

I'm willing to disregard function pointers, as they don't apply to the current problem, but feel free to mention them, as I have a sneaking suspicion they will be central to the "correct" answer.

Eric
  • 95,302
  • 53
  • 242
  • 374
Chris Lutz
  • 73,191
  • 16
  • 130
  • 183

8 Answers8

258

size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type

Not necessarily! Hark back to the days of segmented 16-bit architectures for example: an array might be limited to a single segment (so a 16-bit size_t would do) BUT you could have multiple segments (so a 32-bit intptr_t type would be needed to pick the segment as well as the offset within it). I know these things sound weird in these days of uniformly addressable unsegmented architectures, but the standard MUST cater for a wider variety than "what's normal in 2009", you know!-)

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 7
    This, along with the numerous others who jumped to the same conclusion, explains the difference between `size_t` and `uintptr_t` but what about `ptrdiff_t` and `intptr_t` - wouldn't both of these be able to store the same range of values on almost any platform? Why have both signed and unsigned pointer-sized integer types, particularly if `ptrdiff_t` already serves the purpose of a signed pointer-sized integer type. – Chris Lutz Sep 23 '09 at 17:15
  • 9
    Key phrase there is "on *almost* any platform", @Chris. An implementation is free to restrict pointers to the range 0xf000-0xffff - this requires a 16bit intptr_t but only a 12/13-bit ptrdiff_t. – paxdiablo Sep 23 '09 at 20:59
  • 35
    @Chris, only for pointers _inside the same array_ is it well-defined to take their difference. So, on exactly the same segmented 16-bit architectures (array must live inside a single segment but two different arrays can be in different segments) pointers must be 4 bytes but pointer **differences** could be 2 bytes! – Alex Martelli Sep 24 '09 at 01:44
  • 2
    So... sizeof(size_t) <= sizeof(uintptr_t)? – JoshG Aug 15 '13 at 06:10
  • 6
    @AlexMartelli: Except that pointer differences can be positive or negative. The standard requires `size_t` to be at least 16 bits, but `ptrdiff_t` to be at least 17 bits (which in practice means it will probably be at least 32 bits). – Keith Thompson Aug 27 '13 at 23:35
  • 4
    Nevermind segmented architectures, what about a modern architecture like x86-64? Early implementations of this architecture only give you a 48-bit addressable space, but the pointers themselves are a 64-bit data type. The largest contiguous block of memory you could reasonably address would be 48-bit, so I have to imagine `SIZE_MAX` should not be 2**64. This is using flat addressing, mind you; no segmentation is necessary in order to have a mismatch between `SIZE_MAX` and the range of a data pointer. – Andon M. Coleman Nov 02 '13 at 22:36
  • You may find this follow up interesting (quotes your answer) https://retrocomputing.stackexchange.com/q/6975/8579 – Evan Carroll Jul 08 '18 at 00:34
  • @AndonM.Coleman On the x86-64 platform, are there instructions to store/retrieve 48-bit integers, and to do calculations with them? Does the compiler provide a 48-bit integer type ie `uint48_t`? – Craig McQueen Jan 25 '19 at 04:15
  • @KeithThompson: On a freestanding 8-bit or 16-bit platform where no object could exceed 32K, I'd regard an implementation where ptrdiff_t is 16 bits as superior to one which makes it a 24-bit or 32-bit type purely to appease the Standard. – supercat May 29 '21 at 23:08
  • @supercat From C11 7.20.3 Limits of other integer types - 2: _A freestanding implementation need not provide all of these types._ They may just not support `ptrdiff_t` to avoid that problem. – 12431234123412341234123 Aug 12 '21 at 18:31
  • 1
    @12431234123412341234123: The pointer subtraction operator is specified as yielding a result of type `ptrdiff_t`, so that particular type must exist. That type should be large enough to ensure that if `p1` and `p2` point at or just past items in the same array, either the difference will be always be within range of the type, or else implementations will handle pointer-arithmetic scenarios *where the Standard would otherwise impose no requirements* in such a way that pointer and integer wraparounds would cancel each other out when computing `p1+(p2-p1)`, thus yielding `p1`. – supercat Aug 12 '21 at 18:55
  • 1
    @12431234123412341234123: The footnote is intended to indicate that some of the types on the list are specified elsewhere as being optional, not that all of the types on the list are optional. – supercat Aug 12 '21 at 18:58
  • 1
    @AndonM.Coleman: SIZE_MAX is defined as the largest value for `size_t`, regardless of whether any actual objects could be anywhere near that big. – supercat Aug 12 '21 at 19:09
94

Regarding your statement:

"The C standard guarantees that size_t is a type that can hold any array index. This means that, logically, size_t should be able to hold any pointer type."

This is unfortunately incorrect. Pointers and array indexes are not the same thing. It's quite plausible to envisage a conforming implementation that limits arrays to 65536 elements but allows pointers to address any value into a massive 128-bit address space.

C99 states that the upper limit of a size_t variable is defined by SIZE_MAX and this can be as low as 65535 (see C99 TR3, 7.18.3, unchanged in C11). Pointers would be fairly limited if they were restricted to this range in modern systems.

In practice, you'll probably find that your assumption holds, but that's not because the standard guarantees it. Because it actually doesn't guarantee it.

ejohnso49
  • 1,336
  • 1
  • 15
  • 20
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Rather than retype what I said in the comments for Alex Martelli, I'll just say thanks for the clarification, but reiterate the second half of my question (the `ptrdiff_t` vs. `intptr_t` part). – Chris Lutz Sep 23 '09 at 17:21
  • 7
    @Ivan, as with most communication, there needs to be a shared understanding of certain basic items. If you see this answer as "poking fun", I assure you that's a misunderstanding of my intent. Assuming that you're referring to my 'logical fallacy' comment (I can't see any other possibility), that was meant as a factual statement, not some statement made at the expense of the OP. If you'd like to suggest some *concrete* improvement to minimise possibility of misunderstanding (rather than just a general complaint), I'd be happy to consider. – paxdiablo Oct 08 '18 at 21:10
  • 2
    @Ivan, wasn't really happy with the edits you proposed, have rolled back and also tried to remove any unintended offence. If you have any other changes to offer, I'd suggest starting up a chat so we can discuss. – paxdiablo Oct 09 '18 at 05:08
  • 1
    @paxdiablo okay, I guess "this is actually a fallacy" is less patronizing. – ivan_pozdeev Oct 09 '18 at 05:37
43

I'll let all the other answers stand for themselves regarding the reasoning with segment limitations, exotic architectures, and so on.

Isn't the simple difference in names reason enough to use the proper type for the proper thing?

If you're storing a size, use size_t. If you're storing a pointer, use intptr_t. A person reading your code will instantly know that "aha, this is a size of something, probably in bytes", and "oh, here's a pointer value being stored as an integer, for some reason".

Otherwise, you could just use unsigned long (or, in these here modern times, unsigned long long) for everything. Size is not everything, type names carry meaning which is useful since it helps describe the program.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • I agree, but I was considering something of a hack/trick (that I would clearly document, of course) involving storing a pointer type in a `size_t` field. – Chris Lutz Sep 23 '09 at 17:19
  • 1
    @MarkAdler Standard does not require pointers to be representable as integers altogether: *Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.* Thus, only `void*`, `intptr_t` and `uintptr_t` are guaranteed to be able to represent any pointer to data. – Andrew Svietlichnyy Jun 30 '18 at 15:47
  • This is overly naive thinking. E.g. when you need to align generic struct fields, size_t vs pointers might be wrong. You need to use uintptr_t then, because only this guarantees the same alignment and offset. – rurban Feb 19 '21 at 09:24
14

It's possible that the size of the largest array is smaller than a pointer. Think of segmented architectures - pointers may be 32-bits, but a single segment may be able to address only 64KB (for example the old real-mode 8086 architecture).

While these aren't commonly in use in desktop machines anymore, the C standard is intended to support even small, specialized architectures. There are still embedded systems being developed with 8 or 16 bit CPUs for example.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • But you can index pointers just like arrays, so should `size_t` also be able to handle that? Or would dynamic arrays in some far-off segment still be limited to indexing within their segment? – Chris Lutz Sep 23 '09 at 06:06
  • Indexing pointers is only technically supported to the size of the array they point to - so if an array is limited to a 64KB size, that's all that pointer arithmetic needs to support. However, MS-DOS compilers did support a 'huge' memory model, where far pointers (32-bit segmented pointers) were manipulated so they could address the whole of memory as a single array - but the arithemtic done to pointers behind the scenes was pretty ugly - when the offset incremented past a value of 16 (or something), the offset was wrapped back to 0 and the segment part was incremented. – Michael Burr Sep 23 '09 at 06:14
  • 8
    Read http://en.wikipedia.org/wiki/C_memory_model#Memory_segmentation and weep for the MS-DOS programmers who died so that we might be free. – Justicle Sep 23 '09 at 06:18
  • Worse was that the stdlib function didn't take care of the HUGE keyword. 16bit MS-C for all `str` functions and Borland even for the `mem` functions (`memset`, `memcpy`, `memmove`). This meant you could overwrite part of the memory when the offset overflowed, that was fun to debug on our embedded platform. – Patrick Schlüter Feb 01 '10 at 16:40
  • @Justicle: The 8086 segmented architecture is not well supported in C, but I know of no other architecture which is more efficient in cases where a 1MB address space is sufficient but a 64K one would not be. Some modern JVMs actually use addressing very much like x86 real mode, using shifting 32-bit object references left 3 bits to generate object base addresses in a 32GB address space. – supercat Jun 26 '15 at 22:25
6

I would imagine (and this goes for all type names) that it better conveys your intentions in code.

For example, even though unsigned short and wchar_t are the same size on Windows (I think), using wchar_t instead of unsigned short shows the intention that you will use it to store a wide character, rather than just some arbitrary number.

dreamlax
  • 93,976
  • 29
  • 161
  • 209
  • But there's a difference here - on my system, `wchar_t` is much larger than an `unsigned short` so using one for the other would be erroneous and create a serious (and modern) portability concern, whereas the portability concerns between `size_t` and `uintptr_t` seem to lie in the far-off lands of 1980-something (random stab in the dark on the date, there) – Chris Lutz Sep 23 '09 at 06:09
  • Touché! But then again, `size_t` and `uintptr_t` still have implied uses in their names. – dreamlax Sep 23 '09 at 06:17
  • They do, and I wanted to know if there was a motivation for this beyond simply clarity. And it turns out there is. – Chris Lutz Sep 23 '09 at 17:16
4

Looking both backwards and forwards, and recalling that various oddball architectures were scattered about the landscape, I'm pretty sure they were trying to wrap all existing systems and also provide for all possible future systems.

So sure, the way things settled out, we have so far needed not so many types.

But even in LP64, a rather common paradigm, we needed size_t and ssize_t for the system call interface. One can imagine a more constrained legacy or future system, where using a full 64-bit type is expensive and they might want to punt on I/O ops larger than 4GB but still have 64-bit pointers.

I think you have to wonder: what might have been developed, what might come in the future. (Perhaps 128-bit distributed-system internet-wide pointers, but no more than 64 bits in a system call, or perhaps even a "legacy" 32-bit limit. :-) Image that legacy systems might get new C compilers...

Also, look at what existed around then. Besides the zillion 286 real-mode memory models, how about the CDC 60-bit word / 18-bit pointer mainframes? How about the Cray series? Never mind normal ILP64, LP64, LLP64. (I always thought microsoft was pretensious with LLP64, it should have been P64.) I can certainly imagine a committee trying to cover all bases...

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
2

size_t vs. uintptr_t

In addition to other good answers:

size_t is defined in <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <uchar.h>, <wchar.h>. It is at least 16-bit.

uintptr_t is defined in <stdint.h>. It is optional. A compliant library might not define it, likely because there is not a wide-enough integer type to round trip a void*-uintptr_t-void *.

Both are unsigned integer types.

Note: the optional companion intptr_t is a signed integer type.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
-11
int main(){
  int a[4]={0,1,5,3};
  int a0 = a[0];
  int a1 = *(a+1);
  int a2 = *(2+a);
  int a3 = 3[a];
  return a2;
}

Implying that intptr_t must always substitute for size_t and visa versa.

Chris Becke
  • 34,244
  • 12
  • 79
  • 148
  • 14
    All this shows is a particular syntax quirk of C. Array indexing is defined in terms of x[y] being equivalent to *(x + y), and because a + 3 and 3 + a are identical in type and value, you can use 3[a] or a[3]. – Fred Nurk Feb 04 '11 at 13:22