5

Is the behavior of the following program undefined?

#include <stdio.h>

int main(void)
{
    int arr[2][3] = { { 1, 2, 3 },
                      { 4, 5, 6 }
    };

    int *ptr1 = &arr[0][0];      // pointer to first elem of { 1, 2, 3 }
    int *ptr3 = ptr1 + 2;        // pointer to last elem of { 1, 2, 3 }
    int *ptr3_plus_1 = ptr3 + 1; // pointer to one past last elem of { 1, 2, 3 }
    int *ptr4 = &arr[1][0];      // pointer to first elem of { 4, 5, 6 }
//    int *ptr_3_plus_2 = ptr3 + 2; // this is not legal

    /* It is legal to compare ptr3_plus_1 and ptr4 */
    if (ptr3_plus_1 == ptr4) {
        puts("ptr3_plus_1 == ptr4");

        /* ptr3_plus_1 is a valid address, but is it legal to dereference it? */
        printf("*ptr3_plus_1 = %d\n", *ptr3_plus_1);
    } else {
        puts("ptr3_plus_1 != ptr4");
    }

    return 0;
}

According to §6.5.6 ¶8:

Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object.... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

From this, it would appear that the behavior of the above program is undefined; ptr3_plus_1 points to an address one past the end of the array object from which it is derived, and dereferencing this address causes undefined behavior.

Further, Annex J.2 suggests that this is undefined behavior:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

There is some discussion of this issue in the Stack Overflow question, One-dimensional access to a multidimensional array: well-defined C?. The consensus here appears to be that this kind of access to arbitrary elements of a two-dimensional array through one-dimensional subscripts is indeed undefined behavior.

The issue, as I see it, is that it is not even legal to form the address of the pointer ptr3_plus_2, so it is not legal to access arbitrary two-dimensional array elements in this way. But, it is legal to form the address of the pointer ptr3_plus_1 using this pointer arithmetic. Further, it is legal to compare the two pointers ptr3_plus_1 and ptr4, according to §6.5.9 ¶6:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address (the object pointed to by ptr4 must be adjacent in memory to the object pointed to by ptr3 anyway, since array storage must be contiguous), it would seem that *ptr3_plus_1 is as valid as *ptr4.

Is this undefined behavior, as described in §6.5.6 ¶8 and Annex J.2, or is this an exceptional case?

To Clarify

It seems unambiguous that it is undefined behavior to attempt to access the element one past the end of the final row of a two-dimensional array. My interest is in the question of whether it is legal to access the first element of the intermediate rows by forming a new pointer using a pointer to an element from the previous row and pointer arithmetic. It seems to me that a different example in Annex J.2 could have made this more clear.

Is it possible to reconcile the clear statement in §6.5.6 ¶8 that an attempted dereference of a pointer to the location one past the end of an array leads to undefined behavior with the idea that the pointer past the end of the first row of a two-dimensional array of type T[][] is also a pointer of type T * that points to an object of type T, namely the first element of an array of type T[]?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
ad absurdum
  • 19,498
  • 5
  • 37
  • 60
  • "Is it UB to access an element one past the end of a row of a 2d array?" It certainly is for the last row. I see no UB for prior rows - it is all contiguous space. – chux - Reinstate Monica Jul 08 '17 at 15:19
  • 7
    Your 6.5.6 quote makes it pretty clear that this is UB. The provenance of a pointer matters, not just the address it represents. – T.C. Jul 08 '17 at 15:30
  • I don't what you are asking. Your question already has the anser explicitly stated. It apparently lists no exceptions. – too honest for this site Jul 08 '17 at 15:45
  • 1
    From what I understand, this is an open question, something that hasn't been satisfactorily resolved, and it may be clarified in a future version of the standard. – Dietrich Epp Jul 08 '17 at 16:27
  • @DietrichEpp-- do you know of any references to discussions about this as an active issue with the Standard? – ad absurdum Jul 08 '17 at 17:16
  • 1
    @Olaf-- `ptr3_plus_1` is a valid pointer that points one past the end of an array, and so can't be dereferenced; but `ptr3_plus_1` is also a valid pointer that points to the first element of an array, and so should dereferencable. I am trying to reconcile what seems to me an apparent contradiction. Perhaps the answer is that `ptr3_plus_1` can't be said to be a pointer to the first element of the second array at all. – ad absurdum Jul 08 '17 at 19:06
  • Why all this code? Wouldn't just `arr[0][3]` suffice? – n. m. could be an AI Jul 02 '18 at 08:41
  • @n.m. That would be a clear direct out of bound access. – curiousguy Jul 02 '18 at 08:44
  • @curiousguy so what's the difference between a clear direct out of bound access and a muddy indirect out of bound access? Is the latter supposed to be more legal somehow? – n. m. could be an AI Jul 02 '18 at 09:10
  • @n.m. If it goes through a pointer objects that demonstrably cannot store the bounds, it doesn't seem so clearly illegal. – curiousguy Jul 02 '18 at 16:53
  • @curiousguy pointer *have* bounds per the standard. Whether yhey are stored or not is immaterial. It also doesn't matter if the line of reasoning you use to show that they are violated is long or short. – n. m. could be an AI Jul 02 '18 at 18:05
  • @n.m. It mattes because in practice the bounds couldn't possibly be stored, the user can prove it, and an object is allegedly stored in bytes, so its value should correspond to its representation - not meta information derived at compile time. – curiousguy Jul 02 '18 at 18:27
  • @curiousguy when you say `arr[3][0]` is out of bounds, you are using meta information derived at compile time. The standard doesn't say that certain ways of proving UB from compile time meta information are allowed and others are not. – n. m. could be an AI Jul 02 '18 at 18:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174184/discussion-between-curiousguy-and-n-m). – curiousguy Jul 02 '18 at 18:41

2 Answers2

5

So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address

They are.

it would seem that *ptr3_plus_1 is as valid as *ptr4.

It is not.

The pointers are equal, but not equivalent. The trivial well-known example of the distinction between equality and equivalence is negative zero:

double a = 0.0, b = -0.0;
assert (a == b);
assert (1/a != 1/b);

Now, to be fair, there is a difference between the two, as positive and negative zero have a different representation, ptr3_plus_1 and ptr4 on typical implementations have the same representation. This is not guaranteed, and on implementations where they would have different representations, it should be clear that your code might fail.

Even on the typical implementations, while there are good arguments to be made that the same representation implies equivalent values, to the best of my knowledge, the official interpretation is that the standard does not guarantee this, therefore programs cannot rely on it, therefore implementations can assume programs do not do this and optimise accordingly.

  • I had wondered along these lines, yet doesn't [§6.2.5 28](http://port70.net/~nsz/c/c11/n1570.html#6.2.5p28) apply here? "Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements." – ad absurdum Jul 08 '17 at 17:34
  • @DavidBowling That's something totally different. That's what's meant to imply that `char *s = "Hello"; const char *c; memcpy (&c, &s, sizeof c);` is safe despite copying to a different type. –  Jul 08 '17 at 17:38
  • @PeterJ The whole point was to pass in pointers to pointers, in order to copy the pointer value itself. No, I'm not copying 1 char, `sizeof c` is `sizeof(const char *)`, not `sizeof(char)`. –  Jul 08 '17 at 19:51
  • Alright; I see what you mean about §6.2.5 28. Ordinarily I need no convincing that something the Standard deems UB is in fact UB; but here it seems difficult to grant that a pointer of the correct type, `ptr3_plus_1`, can't be dereferenced in the same way as `ptr4`, unless it can be said that the first pointer doesn't point to an object at all. Maybe this is what your suggestion about differing representations gets you.... IAC, UV for now. I am going to wait a bit before accepting to see if anyone else can suggest a resolution. – ad absurdum Jul 08 '17 at 20:36
  • @DavidBowling The standard indeed doesn't say that it doesn't point to an object. It doesn't say that it does point to an object either, you're inferring that from the equivalence between `ptr3_plus_1` and `ptr4`, but the point of my answer is that that equivalence doesn't exist. :) –  Jul 08 '17 at 20:41
  • Well, not just from the fact that they compare equal, but also from the fact the the object pointed to by `ptr4` is one past the object pointed to by `ptr3`, and so my expectation was that `ptr3_plus_1` should point the the object pointed to by `ptr4`. The legality of the comparison just seemed like supporting evidence. – ad absurdum Jul 08 '17 at 20:45
  • @DavidBowling In what way is it "one past" the object pointed to by `ptr3` though? When the standard talks about "one past", it refers to elements of arrays, and `*ptr3` and `*ptr4` are not elements of the same array (even though the arrays they're elements of are themselves elements of the same array). If the fact that they're adjacent in memory could be enough, then given `int x, y;`, do you think `*(&x+1 == &y ? &x+1 : &y) = 4;` is a valid way of setting `y`? –  Jul 08 '17 at 21:11
  • Yes, one past in terms of adjacency in memory. Such a scenario as your `int x, y;...` had crossed my mind, since the Standard says that a pointer to an object that isn't an element of an array behaves as a pointer to the first element of an array of length 1 (for purposes of pointer arithmetic). But there is no guarantee that `x` and `y` are contiguous in memory, while there is a guarantee that `arr[0][2]` and `arr[1][0]` from my example code are contiguous in memory. – ad absurdum Jul 08 '17 at 21:22
  • @DavidBowling Indeed, in my example it's not guaranteed, that's why I had to add a conditional to check whether it actually is, which the example in your question doesn't need. And thinking about it, although based on the current text of the standard I do not believe this distinction is significant, I believe the intent is that in very closely related examples, the distinction *is* significant, so perhaps that example is best ignored. Perhaps I could have used overlapping `union` members instead. –  Jul 08 '17 at 21:31
  • Not sure I follow your `union` example. I notice that in §6.5.3.2 4 the Standard says that if an invalid value has been assigned to a pointer, dereferencing leads to UB. This may fit with your suggestion about representations, in that `ptr3_plus_1` and `ptr4` may have different representations and different values, yet compare equal. Do you agree? – ad absurdum Jul 09 '17 at 15:05
  • @DavidBowling With `union`, I meant I could have picked something like `union { int a; int b[2]; } u;` and then `(&u.a)[1]`. `u.a` is guaranteed to be followed by `u.b[1]` in memory. For the rest of your comment, I agree, but with the added note that it's also possible they could have the same representation, different (meaning "not equivalent") values, and compare equal. –  Jul 09 '17 at 16:10
4

A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.

Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).

It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243