51

Consider the following code:

int* p1 = new int[100];
int* p2 = new int[100];
const ptrdiff_t ptrDiff = p1 - p2;

int* p1_42 = &(p1[42]);
int* p2_42 = p1_42 + ptrDiff;

Now, does the Standard guarantee that p2_42 points to p2[42]? If not, is it always true on Windows, Linux or webassembly heap?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Sergey
  • 7,985
  • 4
  • 48
  • 80
  • 3
    There isn't even a guarantee that `int` objects are `sizeof(int)` aligned (it's the case on all ABI I know, but there are exception to almost all rules in programming, so some ABI may not be that way); when it isn't the case, the code obviously cannot be guaranteed to work. – curiousguy Jan 29 '19 at 07:57
  • 1
    @curiousguy There's no particular reason not to align on byte boundaries on Intel except performance. If instead of `int`, we used `struct i5 { int i[5]; };` in practise `p1` and `p2` would not be `sizeof(i5)` aligned. – Martin Bonner supports Monica Jan 29 '19 at 10:45
  • A follow-up question (though asked earlier): [What is the rationale for limitations on pointer arithmetic or comparison?](https://stackoverflow.com/q/47616508/5376789) – xskxzr Jan 30 '19 at 03:25

4 Answers4

56

To add the standard quote:

expr.add#5

When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_­t in the <cstddef> header ([support.types]).

  • (5.1) If P and Q both evaluate to null pointer values, the result is 0.

  • (5.2) Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.

  • (5.3) Otherwise, the behavior is undefined. [ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_­t, the behavior is undefined. — end note  ]

(5.1) does not apply as the pointers are not nullptrs. (5.2) does not apply because the pointers are not into the same array. So, we are left with (5.3) - UB.

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
  • 4
    5.2 could apply if you have a special allocator (I think) – sudo rm -rf slash Jan 28 '19 at 12:25
  • 8
    @sudorm-rfslash: Dangerous territory. Arrays are objects, but allocators only create storage and not objects. The two arrays are two distinct objects. In between, the implementation may have reserved space for its own overhead regardless of the allocator used. Commonly the implementation stores the number of elements to destroy. (There's a bit of a Standards debate how arrays formally can grow element by element, but that's mostly a `std::vector` thing. `new[100]` is a one-shot operation) – MSalters Jan 28 '19 at 12:59
  • 4
    @sudorm-rfslash 5.2 does not apply even for 2 different subarrays (subobjects of one complete object) of a multidimensional array (e.g. `int a[2][3]; &a[1][0] - &a[0][2];` is UB) and you want it to apply in case when 2 complete array objects are created in the same buffer (e.g. array of `unsigned char`)... – Language Lawyer Jan 28 '19 at 13:44
  • @MSalters Why are people so insistent on performing arithmetics on pointers directly anyway, just cast them to `uintptr_t` and then add and subtract to your heart's content. – Joker_vD Jan 29 '19 at 07:01
  • 3
    @Joker_vD: That's not guaranteed to be meaningful. `uintptr_t` has enough bits to hold a pointer value, that's it. – MSalters Jan 29 '19 at 08:22
  • @Joker_vD mapping from integers to pointers and vice versa is implementation-defined and the only thing which is guaranteed is that casting from pointer to integer and back should give the same pointer value (if there is integer type of suitable size). And GCC, for example, does not guarantee more than the absolute minimum required by the standard and says that using integers to bypass pointer arithmetic restrictions is UB https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html – Language Lawyer Jan 29 '19 at 09:25
  • @LanguageLawyer GCC pretends that integer values from pointer casts have an "origin" which AFAIK is a made up claim not based on anything in the std. – curiousguy Jan 29 '19 at 11:05
  • @curiousguy this made up claim doesn't conflict with the standard, since it's the territory of UB anyway. – Ruslan Jan 29 '19 at 11:55
  • @MSalters But pointer arithmetics most of the time is guaranteed to be meaning*less*. Hell, reading a pointer value from a properly initialized pointer-typed variable can be an UB, which doesn't happen with integers. – Joker_vD Jan 29 '19 at 11:55
  • @Ruslan What exactly has UB? GCC totally made up the idea that an integer has an origin like a pointer. And BTW the origin is a made up concept by the C committee no based on any actual clause of the std as written. And **origin of a pointer contradicts that fact pointers are trivial types.** You can't have origin and still claim a value is a function of the representation. The thing is a scam. – curiousguy Jan 29 '19 at 14:09
  • 1
    @curiousguy pointer value origin is only relevant when you do integer arithmetic to subvert the rule about UB in pointer arithmetic. If you don't try to convert back from integer to pointer, there's no UB — you just get the integral results you would if you'd written it in assembly. – Ruslan Jan 29 '19 at 14:18
  • @curiousguy the Committee explained their intent and AFAIK intent >>> wording. – Language Lawyer Jan 29 '19 at 20:46
  • @LanguageLawyer I agree that well understood intent is more important than the exact words, esp. when there is a consensus over what should be done. I agree that some practices are crazy should not be supported on purpose, like `while(int i=rand(); i!=&x; i=rand()); int *p = (int*)i; // same as p=&x` and there is a consensus on that. It's difficult to say what should be supported in a high/low level programming language; the low level programming clashes with the high level optimizations. I don't think either C or C++ are good are combining both. – curiousguy Jan 29 '19 at 22:09
  • 1
    @Ruslan: The Standard makes no attempt to mandate that compilers be suitable for any particular purpose, but instead expects that quality compilers claiming to be suitable for various purposes will uphold the Spirit of C *whether the Standard requires them to do so or not*, including the principle "Don't prevent the programmer from doing what needs to be done". Conversion of pointers to integers in order to do arithmetic on them is a message to any compiler that isn't being willfully deaf about what the programmer is trying to do, i.e. "what needs to be done". – supercat Jan 30 '19 at 07:20
  • 1
    @Ruslan: [Both the charter and published Rationale documents describe the Spirit of C, even though the Standard itself ignores it]. – supercat Jan 30 '19 at 07:21
29
const ptrdiff_t ptrDiff = p1 - p2;

This is undefined behavior. Subtraction between two pointers is well defined only if they point to elements in the same array. ([expr.add] ¶5.3).

When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_­t in the <cstddef> header ([support.types]).

  • If P and Q both evaluate to null pointer values, the result is 0.
  • Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.
  • Otherwise, the behavior is undefined

And even if there was some hypothetical way to obtain this value in a legal way, even that summation is illegal, as even a pointer+integer summation is restricted to stay inside the boundaries of the array ([expr.add] ¶4.2)

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
  • Otherwise, if P points to element x[i] of an array object x with n elements,81 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n.
  • Otherwise, the behavior is undefined.
Community
  • 1
  • 1
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Is there a reason the standard let's you create a pointer to an element one past the end of an array? – Vaelus Jan 28 '19 at 17:02
  • 4
    @Vaelus This makes it easier to write loops which increment a pointer at each step. For example, otherwise `for (char *x = xs; x < (xs + sizeof(xs)); x++) {...}` would be illegal because it increments x past the end of its array just before aborting. – amalloy Jan 28 '19 at 17:24
  • 4
    @amalloy _would be illegal because it increments x past the end of its array just before aborting_ It would become illegal before the first increment — in `xs + sizeof(xs)`. – Language Lawyer Jan 28 '19 at 17:26
  • 1
    @LanguageLawyer But that is [explicitly allowed](http://eel.is/c++draft/expr.add#4.2), or am I misreading? You can point to the hypothetical one-past-the-end element of an array (as long as you don't dereference), so both `xs + sizeof(xs)` as well as `x` being equal to that value are allowed. – Max Langhof Jan 29 '19 at 08:36
  • 3
    @MaxLanghof: AFAICT LanguageLawyer is just saying that, _if `xs + sizeof(xs)` was illegal (BUT IT'S NOT), you'd get UB even just at the first evaluation of the condition, just before incrementing, as it's there that the `xs + sizeof(xs)` subexpression is evaluated for the first time. That being said, as shown above, creating a pointer to the "one-past-last" element is explicitly allowed (as long as you don't dereference it) and is common idiom. – Matteo Italia Jan 29 '19 at 08:47
  • @MaxLanghof As it was correctly commented, we are discussing a hypothetical situation if pointing to just past the last element weren't allowed. @amalloy told that the increment after the last iteration would become invalid in that case and I corrected that `xs + sizeof(xs)` would cause UB even before the first increment. – Language Lawyer Jan 29 '19 at 09:20
  • @LanguageLawyer Oh right, I somehow totally missed Vaelus' comment - it had to be something obvious. Sorry for the trouble. – Max Langhof Jan 29 '19 at 09:22
  • @Vaelus: Every object has a starting address and an ending address, with the latter pointing "just past" the object. Both addresses are valid, but only the starting address can be used directly to access the object (the ending address can be used to compute the starting address, which can then be used to address the object). – supercat Jan 30 '19 at 07:23
9

The third line is Undefined Behavior, so the Standard allows anything after that.

It's only legal to subtract two pointers pointing to (or after) the same array.

Windows or Linux aren't really relevant; compilers and especially their optimizers are what breaks your program. For instance, an optimizer might recognize that p1 and p2 both point to the begin of an int[100] so p1-p2 has to be 0.

MSalters
  • 173,980
  • 10
  • 155
  • 350
7

The Standard allows for implementations on platforms where memory is divided into discrete regions which cannot be reached from each other using pointer arithmetic. As a simple example, some platforms use 24-bit addresses that consist of an 8-bit bank number and a 16-bit address within a bank. Adding one to an address that identifies the last byte of a bank will yield a pointer to the first byte of that same bank, rather than the first byte of the next bank. This approach allows address arithmetic and offsets to be computed using 16-bit math rather than 24-bit math, but requires that no object span a bank boundary. Such a design would impose some extra complexity on malloc, and would likely result in more memory fragmentation than would otherwise occur, but user code wouldn't generally need to care about the partitioning of memory into banks.

Many platforms do not have such architectural restrictions, and some compilers which are designed for low-level programming on such platforms will allow address arithmetic to be performed between arbitrary pointers. The Standard notes that a common way of treating Undefined Behavior is "behaving during translation or program execution in a documented manner characteristic of the environment", and support for generalized pointer arithmetic in environments that support it would fit nicely under that category. Unfortunately, the Standard fails to provide any means of distinguishing implementations that behave in such useful fashion and those which don't.

supercat
  • 77,689
  • 9
  • 166
  • 211