9

I see several posts (such as size_t vs. uintptr_t) about size_t versus uintptr_t/ptrdiff_t, but none about the relative sizes of these new c99 ptr size types.

example machine: vanilla ubuntu 14lts x64, gcc 4.8:

printf("%zu, %zu, %zu\n", sizeof(uintptr_t), sizeof(intptr_t), sizeof(ptrdiff_t));

prints: "8, 8, 8"

this does not make sense to me, as i would expect the diff type, which must be signed, to require more bits than the unsigned ptr itself.

consider:

NULL - (2^64-1)  /*largest ptr, 64bits of 1's.*/

which being 2's complement negative would not fit in 64bits; hence I would expect ptrdiff_t to be larger than than ptr_t.

[a related question is why is intptr_t the same size as uintptr_t .... although i was comfortable this was possibly just to allow a signed type to contain the representation's bits (eg, using signed arithmetic on a negative ptr would (a) be undefined, and (b) have limited utility as ptrs are by definition "positive")]

thanks!

Community
  • 1
  • 1
some bits flipped
  • 2,592
  • 4
  • 27
  • 42
  • 1
    Your only guarantee between intptr_t and uintptr_t is that it won't be smaller. In fact, all that is really happening is that you're saying (with the uintptr_t) "there will be no signed representation", whatever that happens to be in the implementation (most likely 2's complement for lots of really good reasons) – David Hoelzer Jun 05 '15 at 23:56
  • 4
    Do not use `NULL` in an integer expression! If you mena the integer `0`, write it! `NULLcan be `(void *)0`, making the expression result undefined (arithmetics on the _null pointer_ is undefined. – too honest for this site Jun 06 '15 at 00:35
  • The same reason for adding/subtracting 2 ints or unsigned ints result in the same type even though you need 1 more bit to avoid overflow represent the result correctly. It's just impractical to have a type 1 bit longer than the unsigned version, and if we can have that type, why don't just use that new signed type as an unsigned one when needed? Then we'll need a signed type 1 more bit longer and the recursive problem can't be solved – phuclv Jul 07 '18 at 06:50

2 Answers2

24

Firstly, it is clear not what uintptr_t is doing here. The languages (C and C++) do not allow you to subtract just any arbitrary pointer values from each other. Two pointers can only be subtracted if they point into the same object (into the same array object). Otherwise, the behavior is undefined. This means that these two pointers cannot possibly be farther than SIZE_MAX bytes apart. Note: the distance is limited by the range of size_t, not by the range of uintptr_t. In general case uintptr_t can be a larger type than size_t. Nobody in C/C++ ever promised you that you should be able to subtract two pointers located UINTPTR_MAX bytes apart.

(And yes, I know that on flat-memory platforms uintptr_t and size_t are usually the same type, at least by range and representation. But from the language point of view it is incorrect to assume that they always are.)

Your NULL - (2^64-1) (if interpreted as address subtraction) is a clear example of such questionable subtraction. What made you think that you should be able to do that in the first place?

Secondly, after switching from the irrelevant uintptr_t to the much more relevant size_t, one can say that your logic is perfectly valid. sizeof(ptrdiff_t) should be greater than sizeof(size_t) because of an extra bit required to represent the signed result. Nevertheless, however weird it sounds, the language specification does not require ptrdiff_t to be wide enough to accommodate all pointer subtraction results, even if two pointers point to parts of the same object (i.e. they are no farther than SIZE_MAX bytes apart). ptrdiff_t is legally permitted to have the same bit-count as size_t.

This means that a "seemingly valid" pointer subtraction may actually lead to undefined behavior simply because the result is too large. If your implementation allows you to declare a char array of size, say, SIZE_MAX / 3 * 2

char array[SIZE_MAX / 3 * 2]; // This is smaller than `SIZE_MAX`

then subtracting perfectly valid pointers to the end and to the beginning of this array might lead to undefined behavior if ptrdiff_t has the same size as size_t

char *b = array;
char *e = array + sizeof array;

ptrdiff_t distance = e - b; // Undefined behavior!

The authors of these languages decided to opt for this easier solution instead of requiring compilers to implement support for [likely non-native] extra wide signed integer type ptrdiff_t.

Real-life implementations are aware of this potential problem and usually take steps to avoid it. They artificially restrict the size of the largest supported object to make sure that pointer subtraction never overflows. In a typical implementation you will not be able to declare an array larger than PTRDIFF_MAX bytes (which is about SIZE_MAX / 2). E.g. even if SIZE_MAX on your platform is 264-1, the implementation will not let you to declare anything larger than 263-1 bytes (and real-life restrictions derived from other factors might be even tighter than that). With this restriction in place, any legal pointer subtraction will produce a result that fits into the range of ptrdiff_t.

See also,

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • "uintptr_t is generally a larger type than size_t" - that is wrong. On 32 bit systems, both are very likely the same size (they are on ARM e.g) and on 64 systems, if you can alloc objects >= 2**32 `char`s (e.g. Linux x64). Actually `uintptr_t` only is guaranteed to convert to/from (any) pointer. Arithmetics on the value stores in not defined. – too honest for this site Jun 06 '15 at 00:17
  • 5
    @Olaf: Not, it is not wrong. `unintptr_t` has the same size as `size_t` on flat-memory platforms, which is just a coincidental and completely inconsequential property of flat-memory platforms. On segmented-memory platforms `size_t` will typically be smaller than `uintptr_t` (segmented-memory platforms usually support different *memory models*, which will dictate relative size of `size_t` and `uintptr_t`). And in general case, from the abstract language point of view `sizeof(size_t) <= sizeof(uintptr_t)`. – AnT stands with Russia Jun 06 '15 at 00:20
  • So you refer to 8051? Most compilers for this are not even completely standard compliant - for good reasons. Which other _majority_ of segmented platforms do you actually know? x86 has become flat for years now -luckily! – too honest for this site Jun 06 '15 at 00:23
  • `size_t` only has to deal with the number of bytes in a single object. `uintptr_t` (if it exists) has to hold the converted value of a `void*`, which can address any byte of any object. So yes, on a segmented memory system, `size_t` may be substantially smaller than `uintptr_t`. But such systems are not typical. – Keith Thompson Jun 06 '15 at 00:24
  • AnT: To clarify my point: My problem is the term "generally", which implies a majority. Just _that_ is wrong - sorry should be elaborated that better. – too honest for this site Jun 06 '15 at 00:26
  • 5
    @Olaf: DOS, Win16, segmented IBM mainframes... But it is completely irrelevant what has become what now. I don't care about it at all. As long as the language specification maintains that differentiation, it will be there. This is the whole reason we have separate concepts of `size_t` and `uintptr_t` in the language. If it didn't exist, there wouldn't be any need for `uintptr_t` at all - `size_t` would have served that purpose perfectly well (and many people are known to abuse it for that very purpose). – AnT stands with Russia Jun 06 '15 at 00:26
  • Read my answer, I did never say different. Use these myself and am happy stdint.h was added with C99 to clean up that mess of `BYTE`, `U8`, etc in embedded programming (well, except for some ppl still insisting on their own stuff). – too honest for this site Jun 06 '15 at 00:30
  • 1
    A high-quality implementation will not permit the creation of objects larger than `PTRDIFF_MAX`. – R.. GitHub STOP HELPING ICE Jun 06 '15 at 00:51
  • 1
    @AnT the part of my app this is getting used in is a chunk of code that is copying code: literally `&labelA - &LabelB` to calculate the actual layed-out size of an object code segment ... clearly **very platform dependant**. What prompted my question here was wanting a signed type as a result from subtracting two pointers - specifically so i could just do an `abs()` and not have to worry about which was larger. What i'm hearing is there is no such standard signed type? – some bits flipped Jun 08 '15 at 21:00
  • The wording here is really confusing you say "This means that they cannot possibly be farther than SIZE_MAX bytes apart." then you go to into showing how they can further apart but that ptrdiff is simply undefined because of an implementation decision by the language's authors. I would drop that and start with the implementation decision and show how it affect the language -- it would be much easier to grok the answer that way. – Evan Carroll Jun 15 '18 at 18:06
  • @Evan Carroll: I'm not sure I understand your point. Overflowing `ptrdiff_t` does not mean that the pointers are farther than `SIZE_MAX` bytes apart. When `ptrdiff_t` overflows, it means that pointers are farther than `PTRDIFF_MAX` bytes apart (or `PTRDIFF_MIN`, depending on the sign). `PTRDIFF_MAX` and `PTRDIFF_MIN` don't have to match `SIZE_MAX` by absolute value. Which is why two pointers can still be within `SIZE_MAX` bytes range from each other, and yet overflow `ptrdiff_t` on subtraction. – AnT stands with Russia Jun 15 '18 at 18:22
  • @R..: Remember that most real implementations provide functions like `mmap` that ISO C++ does not standardize. Running under an x86-64 Linux kernel, a 32-bit process can `mmap` more than 2GiB of contiguous memory. There's no reason for the kernel to stop that from working, and no reasonable mechanism for a C++ implementation to insert a wrapper that causes such an `mmap` call to not happen. But it's also not reasonable for the C++ implementation (`g++ -m32`) to use 64-bit `ptrdiff_t` / `ssize_t`. – Peter Cordes Jun 15 '18 at 18:39
  • In fact, 32-bit x86 glibc `malloc` (not just `mmap`) even succeeds for allocations up to ~2.7GiB on my desktop. But [GCC / G++ don't support objects larger than `PTRDIFF_MAX`](https://stackoverflow.com/questions/9386979/what-is-the-maximum-size-of-an-array-in-c#comment85866179_9387041); they implement pointer subtraction without keeping the carry-out from the `sub` when right-shifting for types larger than `char`. IIRC, gcc warns when compiling a program that passes a compile-time constant arg that large to `malloc`; it doesn't go so far as not permitting it, though. – Peter Cordes Jun 15 '18 at 18:42
  • @AnT you deleted the part I was working from: if ptrdiff_t is always one bit larger than size_t then how do you have size_t - size_t overflow ptrdiff_t which is signed and one bit larger – Evan Carroll Jun 15 '18 at 18:45
  • @Evan Carroll: I never said that "ptrdiff_t is *always* one bit larger than size_t". What I meant that any implementation that would *decide* to 1) allow creating objects as large as `SIZE_MAX`, and 2) guarantee that pointer subtraction always fits into `ptrdiff_t`, would be forced to use `ptrdiff_t` that is at least one bit larger than `size_t`. However, language specification does not require implementations to make these guarantees (neither 1 not 2 is required). And implementations usually don't. – AnT stands with Russia Jun 15 '18 at 18:48
  • And I didn't really delete that part. This point is still present in my answer in the following sentence: "`sizeof(ptrdiff_t)` should be greater than `sizeof(size_t)` because of an extra bit required to represent the signed result." – AnT stands with Russia Jun 15 '18 at 18:49
  • I also don't get this, "In a typical implementation you will not be able to declare an array larger than PTRDIFF_MAX bytes (which is about SIZE_MAX / 2)". I don't think this answer is working for me, I guess it was useful for others. I'm going to go looking for a different explanation to clarify this, may come back to it later. – Evan Carroll Jun 15 '18 at 18:50
  • @Evan Carroll: This is just a real-life fact. Try it: https://godbolt.org/g/vwf6Ts. The moment you try to make that array larger, you'll get an error message about its being "too large". – AnT stands with Russia Jun 15 '18 at 19:06
  • I'm not going to argue with facts, I just don't think the explanation here helps *me* understand it. I'll get back to you and perhaps try to explain this later, or tell you what I am missing. – Evan Carroll Jun 15 '18 at 19:17
  • 1
    @Evan Carroll : I'd suggest that maybe, when you have time, you take a look at "three options of dealing with `size_t`/`ptrdiff_t` dilemma" I outlined in this answer: https://stackoverflow.com/a/42594384/187690 – AnT stands with Russia Jun 15 '18 at 19:28
  • 1
    @AnT that is an all around **far** better explanation in my eyes. – Evan Carroll Jun 15 '18 at 19:48
  • @AnT roll PTDRIFF_MIN and PTRDIFF_MAX into that and uintptr_t and call it a day with a canonical answer. – Evan Carroll Jun 15 '18 at 19:50
  • @EvanCarroll: An implementation may usefully allow declaration of arrays whose size is larger than `PTRDIFF_MAX` if it can guarantee that given any two pointers `p` and `q` to elements of such an array, p==(p-q)+q. Note that if e.g. `ptrdiff_t` were a signed 16-bit type and `p` was 49152 bytes past `q`, the Standard wouldn't *require* any particular behavior when subtracting p-q, but a common and useful behavior would be to yield -16384. The Standard also wouldn't *require* any particular behavior when adding -16384 to p in that case, but a common useful behavior would be to yield p+49152. – supercat Jun 18 '18 at 17:26
  • @EvanCarroll: For whatever reason, the authors of the C11 decided to mandate that ptrdiff_t be able to accommodate values +/- 65535 without regard for whether they have enough memory to accommodate objects that size, but don't mandate that systems where objects can exceed ptrdiff_t bytes must support the behaviors are necessary to make `(p-q)+q` yield `p` in defined fashion whenever `p` and `q` identify parts of the same array, even if `p-q` overflows. – supercat Jun 18 '18 at 17:28
-2

The accepted answer is not wrong, but does not offer much insight into why intptr_t, size_t and ptrdiff_t is actually useful, and how to use them. So here it is:

  • size_t is basically the type of a size_of expression. It is only required to be able to hold the size of the largest object that you can make, including arrays. So if you can only ever use 64k continues memory, then size_t can be as little as 16 bits, even if you have 64 bit pointers.

  • ptrdiff_t is the type of pointer difference, e.g &a - &b. And while it is true that 0 - &a is undefined behavior (as doing almost everything in C/C++), whatever it is, must fit into ptrdiff_t. It is usually the same size as pointers, because that makes the most sense. If ptrdiff_t would be a weird size, pointer arithmetics itself would break.

  • intptr_t/uintptr_t has the same size as pointers. They fit into the same int*_t pattern, where * is the size of the int. As with all int*_t/uint*_t types the standard for some reason allows them to be larger then required, but that's very rare.

As a rule of thumb, you can use size_t for sizes and array indices, and use intptr_t/uintptr_t for everything pointer related. Do not use ptrdiff_t.

Evan Dark
  • 1,311
  • 7
  • 7
  • 1
    *the same size as pointers* You repeat that phrase several times. You're assuming that pointers have one single size. They don't have to. – Andrew Henle Feb 06 '21 at 13:53
  • @AndrewHenle: To be fair, in modern C++ implementations for mainstream real-world CPUs, all pointers are the same size. Flat memory models and byte-addressable machines are basically standard these days, and anything that isn't like that is considered "weird". (But sure, in abstract C++ terms, that's not necessary.) – Peter Cordes Feb 07 '21 at 00:41
  • 1
    *while it is true that 0 - &a is undefined behavior [...], whatever it is, must fit into `ptrdiff_t`* - you're implying that UB means there is some result, you just don't know what it is. That's not the case, that would be an "undefined result". UB is worse than that: the program might crash, or other unrelated parts of your code might break. (But crashing on this is only likely with `-fsanitize=undefined`). See http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html and [Does the C++ standard allow for an uninitialized bool to crash a program?](//stackoverflow.com/a/54125820) – Peter Cordes Feb 07 '21 at 00:45
  • @AndrewHenle - If there are multiple pointer sizes, it simply holds the largest. – Evan Dark Feb 08 '21 at 09:54
  • @PeterCordes - As long as the two pointers have the same type you can subtract them. Yes, it's undefined behavior like 90% of C++ but (A - B) + B == A will always hold true regardless. – Evan Dark Feb 08 '21 at 09:59
  • 1
    ISO C++ does *not* guarantee that `(A - B) + B == A` if `A-B` produces undefined behaviour. It's allowed to crash or anything else. Of course it's also allowed to Just Work, and that's what will happen on almost any real implementation that isn't intentionally hostile (and that doesn't have UB detection switched on), especially ones with a flat memory model. But it's important to distinguish between in-practice real-world behaviour vs. what's truly guaranteed by ISO C++ for SO answers about language design. – Peter Cordes Feb 08 '21 at 10:12
  • 1
    If 90% of the C++ you write has undefined behaviour, you might want to consider doing it more carefully. Most code has plenty of stuff that *could* be UB on some hypothetical implementation, e.g. assumptions about types and other implementation-defined stuff, but only 10% at most of C++ is actually benign UB on the mainstream implementations the developers actually care about running on, hopefully well under 1% but I don't have a good estimate on this. – Peter Cordes Feb 08 '21 at 10:16
  • Yes, ISO C++ does not *guarantee* that `(A - B) + B == A`. However, if that were **not** true, it'd mess up the pointer arithmetic, and would create major bugs everywhere. So even though C++ oddly does not guarantee that it has to hold. And I meant that 90% of the standard is UB. Calling malloc is UB, because casting a (void *) to anything else then it was before is UB :D – Evan Dark Feb 08 '21 at 10:22
  • 1
    @EvanDark [`[u]intptr_t` doesn't even have to exist](https://port70.net/~nsz/c/c11/n1570.html#7.20.1.4). – Andrew Henle Feb 08 '21 at 10:36
  • 1
    *Calling malloc is UB, because casting a (void *) to anything else then it was before is UB* Sometimes I wish we could downvote a comment. That comment demonstrates quite a misunderstanding. – Andrew Henle Feb 08 '21 at 10:36
  • @AndrewHenle Sure it does. You are just looking at the *wrong* standard. https://pubs.opengroup.org/onlinepubs/7908799/xsh/inttypes.h.html – Evan Dark Feb 08 '21 at 10:39
  • In the end, the goal of my answer was to give *actual useful advice* on intptr_t/uintptr_t, size_t and ptrdiff_t. The accepted answer already said what the standard has to offer, and its correct, just not very useful. In contrast my answer tells you what these are and how you should use them. Pick the one you want. – Evan Dark Feb 08 '21 at 10:50