Pointer-to-array overlapping end of array

Question

Is this code correct?

int arr[2];

int (*ptr)[2] = (int (*)[2]) &arr[1];

ptr[0][0] = 0;

Obviously ptr[0][1] would be invalid by accessing out of bounds of arr.

Note: There's no doubt that ptr[0][0] designates the same memory location as arr[1]; the question is whether we are allowed to access that memory location via ptr. Here are some more examples of when an expression does designate the same memory location but it is not permitted to access the memory location that way.

Note 2: Also consider **ptr = 0; . As pointed out by Marc van Leeuwen, ptr[0] is equivalent to *(ptr + 0), however ptr + 0 seems to fall foul of the pointer arithmetic section. But by using *ptr instead, that is avoided.

Note: I have tagged both C and C++ however each standard is a bit different in this area, so please indicate which language your answer applies to :) — M.M, Mar 24 '15 at 21:21
No, it is not correct in general to treat an object as though it was an object of some other type. Clearly `arr[1]` is an `int` and not an `int[2]`, nor does it form the initial part of such an object. — Kerrek SB, Mar 24 '15 at 21:27
@KerrekSB it's OK to write, for example, `long x; int *i = (int *)&x;` so long as we do not actually read or write through `i` . (C11 6.3.2.3/7) I suppose a related question would be: is it OK to write `int i; long *x = (long *)&i;` if this isn't an alignment violation on the system, and we do not do any arithmetic or dereferencing on `x` after this? — M.M, Mar 24 '15 at 21:44
Related question: is this line of code correct: `int (*ptr)[2] = (int (*)[2])&arr[0];` or simply `int (*ptr)[2] = (int (*)[2])arr;` — juhist, Mar 24 '15 at 21:50
@juhist Good question. I think it is correct in that I do not see any reason why it should not be :) And it would also be OK to access ints through `ptr` there. — M.M, Mar 24 '15 at 21:56
Yes, &arr is indeed pointer to an int[], but &arr[1] is entirely different: it's a pointer to an int. — juhist, Mar 24 '15 at 22:24
@user657267 how so? The object being accessed is `arr[1]` which has dynamic type `int`, and the lvalue doing the access is `ptr[0][0]` which as has type `int`. `int` and `int` are compatible — M.M, Mar 24 '15 at 22:50
Your intention might be clearer if you add a `typedef` or too to get around C's horrid syntax. I think what you're asking is equivalent to the question of whether, given `typedef char charquad[4];` an expression like `charquad *t = (charquad*)malloc(2);` would be legitimate; I would think it would be equivalent to `typedef struct { char a,b,c,d;} charquad2; charquad2 *t2 = (charquad2*)malloc(2);`, which would I think be invalid if code ever accessed `t2->c` or `t2->d`, but valid if it did not. — supercat, Mar 24 '15 at 23:22
@supercat agree that those are equivalent examples in terms of bounds access (although for those who think there is strict aliasing problems, it may be different) — M.M, Mar 24 '15 at 23:25
Related question: http://stackoverflow.com/questions/29244504/casting-pointer-to-array-into-pointer?noredirect=1 -- this question is dependent on the pointer to an array being the same as a pointer to its initial member (as in have the same address, not as in have the same type), and people are trying to find there a section in the standard which guarantees that. — juhist, Mar 24 '15 at 23:33
@MattMcNabb: A structure type could have aliasing problems beyond those of an array, and given `typedef int int2[2],int4[4];`, I think it's clear that a compiler must assume that writes to an `int*` would be capable of writing to variables of type `int2` or `int4`, and that writes to the latter types could modify the target of an `int*`. I'm not sure whether that would imply that a write to an `int2*` could modify an `int4*` or vice versa. While I think I understand the historical reasons for many of C's quirks better than most people, that doesn't mean... — supercat, Mar 24 '15 at 23:37
...I don't yearn for a move toward a language which behaves sanely. It saddens me to think how many man-hours are wasted because the most common embedded-systems language has no means of specifying what sorts of behavior a program is expecting for things like integer promotion, type layout, etc. One could add to the C language ways of specifying such things such that code would run on-is on implementations that "naturally" do what code requires, and might run erroneously on older C implementations that don't recognize what they're supposed to do, but would either run correctly or... — supercat, Mar 24 '15 at 23:48
@supercat yeah, the main reason for questions like this is so that they can be linked to to avoid pedantic objections to practical answers — M.M, Mar 24 '15 at 23:49
...fail compilation on all compilers conforming to the new standard. — supercat, Mar 24 '15 at 23:51
There is no practical reason why it could not be supported. But in C++ it isn't supported (because the language is very restrictive wrt. `reinterpret_cast`). On the third hand, the lack of support only means that if someone creates a perverse C++ implementation that detects such usage and foils it, then that someone can truthfully call that implementation conforming, which is of no practical importance since even the most challenging successful implementations (so to speak) would never be quite that challenging: it's just way too much. — Cheers and hth. - Alf, Mar 25 '15 at 00:42
@supercat, the 'equivalent' struct could be padded, which would make it nonequivalent... at least I think so. — Samuel Edwin Ward, Mar 25 '15 at 15:15
@SamuelEdwinWard: A struct containing four `char` values would typically not be padded, nor would it typically have an alignment stricter than that of `char`. The standard would not prohibit either of those things, but typically I don't think they'd be an issue. I should have perhaps have used the adjective "analogous" for the structure in any case. — supercat, Mar 25 '15 at 16:17
Any access to that memory location via `ptr` is not valid as `ptr` is not actually pointing to `sizeof(int)*2` bytes of memory. — haccks, Mar 25 '15 at 17:01
See also OP's answer to [Is it possible to access two dimensional array with negative indexing?](//stackoverflow.com/q/29242409) — Joseph Quinsey, Jul 30 '18 at 22:17

qeadz · Answer 1 · 2015-03-25T00:34:21.100

Not an answer but a comment that I can't seem to word well without being a wall of text:

Given arrays are guaranteed to store their contents contiguously so that they can be 'iterated over' using a pointer. If I can take a pointer to the begin of an array and successively increment that pointer until I have accessed every element of the array then surely that makes a statement that the array can be accessed as a series of whatever type it is composed of.

Surely the combination of: 1) Array[x] stores its first element at address 'array' 2) Successive increments of the a pointer to it are sufficient to access the next item 3) Array[x-1] obeys the same rules

Then it should be legal to at least look at the address 'array' as if it were type array[x-1] instead of type array[x].

Furthermore given the points about being contiguous and how pointers to elements in the array have to behave, surely it must be legal to then group any contiguous subset of array[x] as array[y] where y < x and it's upper bound does not exceed the extent of array[x].

Not being a language-lawyer this is just me spouting some rubbish. I am very interested in the outcome of this discussion though.

EDIT:

On further consideration of the original code, it seems to me that arrays are themselves very much a special case in many regards. They decay to a pointer, and I believe can be aliased as per what I just said earlier in this post.

So without any standardese to back up my humble opinion, an array can't really be invalid or 'undefined' as a whole if it doesn't really get treated as a whole uniformly.

What does get treated uniformly are the individual elements. So I think it only makes sense to talk about whether accessing a specific element is valid or defined.

In the case of this example, if I understood it right, y *does* exceed the extent because it is the second and third elements of a two-element array. — Samuel Edwin Ward, Mar 24 '15 at 22:54
I agree with all your thoughts so long as the group chosen is fully within the bounds of the array (e.g. `int b[3]; int (*p)[2] = (int(*)[2])&b[1];`) however what I'm unsure about is whether it is a problem that original example "looks" out of bounds. — M.M, Mar 24 '15 at 22:55
@MattMcNabb ok true - I ignored the bit which extends out of bounds because it wasn't accessed and then promptly forgot about it when I was considering Kerrek's comment to which this was going to be a reply before it got elevated to an answer for character count reasons. — qeadz, Mar 24 '15 at 23:21

score 3 · Answer 2 · answered Mar 24 '15 at 22:32

3

For C++ (I'm using draft N4296) [dcl.array]/7 says in particular that if the result of subscripting is an array, it's immediately converted to pointer. That is, in ptr[0][0] ptr[0] is first converted to int* and only then second [0] is applied to it. So it's perfectly valid code.

For C (C11 draft N1570) 6.5.2.1/3 states the same.

answered Mar 24 '15 at 22:32

Anton Savin

40,838
8
54
90

I don't see `6.5.2.1/3` stating the same. In my opinion, it applies to multidimensional arrays, not pointers to arrays. – juhist Mar 24 '15 at 22:38
For C++ the OP's code looks like an aliasing violation ([basic.lval]). – Kerrek SB Mar 24 '15 at 22:38
1

@KerrekSB I don't think so, because again, no access is made through `int[2]` object – Anton Savin Mar 24 '15 at 22:39

score 3 · Answer 3 · edited Jun 20 '20 at 09:12

3

Yes, this is correct code. Quoting N4140 for C++14:

[expr.sub]/1 ... The expression E1[E2] is identical (by definition) to *((E1)+(E2))

[expr.add]/5 ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

There is no overflow here. &*(*(ptr)) == &ptr[0][0] == &arr[1].

For C11 (N1570) the rules are the same. §6.5.2.1 and §6.5.6

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 24 '15 at 22:46

user4709452

111
1

If you think that "the pointer operand and the result point to elements of the same array object, or one past the last element of the array object", then which is that array object? Note that the cited phrase is weird, but it **does not** say "unless evaluation produces an overflow, the behavior is defined". – Marc van Leeuwen Mar 25 '15 at 09:40
@MarcvanLeeuwen I'm wary of applying the quote in 5.7 too broadly. For example , for some large X, `char *ptr = (char *)&X; ptr = ptr + 2;`. In this case neither `ptr` nor `ptr + 2` point to elements of an array object, yet this code should be OK. It seems to me that that quote should be taken as the spirit of the law, not the letter of the law. – M.M Mar 25 '15 at 10:16
@MattMcNabb: if the Standard gives only the spirit of the law, then where shall one find the letter of the law? Also, your example is unconvincing since every object is has an underlying representation as an array of `char` values, which can serve as the "array object" for the quote. But for pointers to other types than `char` that is of no help. – Marc van Leeuwen Mar 25 '15 at 10:32
@MarcvanLeeuwen c&v of "every object is has an underlying representation as an array of char" ? – M.M Mar 25 '15 at 10:35
@MattMcNabb I guess this is said in 1.8[intro.object] "An _object_ is a region of storage", together with 1.7[intro.memory] "The fundamental storage unit is the C++ memory model is the _byte_". Together with some text that specifies that bytes can be reliably manipulated as `char`values, which I am usre is said somewhere. As always, there remains much to be desired for clarity. But I read this as that objects are represented by a contiguous sequence (region) of bytes that provide the storage for the object. – Marc van Leeuwen Mar 25 '15 at 10:49
@MarcvanLeeuwen not an *array object* though. – M.M Mar 25 '15 at 10:51

score 3 · Answer 4 · answered Mar 25 '15 at 09:35

Let me give a dissenting opinion: this is (at least in C++) undefined behaviour, for much the same reason as in the other question that this question linked to.

First let me clarify the example with some typedefs that will simplify the discussion.

typedef int two_ints[2];
typedef int* int_ptr;
typedef two_ints* two_ints_ptr;

two_ints arr;

two_ints_ptr ptr = (two_ints_ptr) &arr[1];

int_ptr temp = ptr[0]; // the two_ints value ptr[0] gets converted to int_ptr
temp[0] = 0;

So the question is whether, although there is no object of type two_ints whose address coincides with that of arr[1] (in the same sense that the adress of arr coincides with that of arr[0]), and therefore no object to which ptr[0] could possibly point to, one can nonetheless convert the value of that expression to one of type int_ptr (here given the name temp) that does point to an object (namely the integer object also called arr[1]).

The point where I think behaviour is undefined is in the evaluation of ptr[0], which is equivalent (per 5.2.1[expr.sub]) to *(ptr+0); more precisely the evaluation of ptr+0 has undefined behaviour.

I'll cite my copy of the C++ which is not official [N3337], but probably the language has not changed; what bothers me slightly is that the section number does not at all match the one mentioned at the accepted answer of the linked question. Anyway, for me it is §5.7[expr.add]

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce overflow; otherwise the behavior is undefined.

Since the pointer operand ptr has type pointer to two_ints, the "array object" mentioned in the cited text would have to be an array of two_ints objects. However there is only one such object here, the fictive array whose unique element is arr that we are supposed to conjure up in such situations (as per: "pointer to nonarray object behaves the same as a pointer to the first element of an array of length one..."), but clearly ptr does not point to its unique element arr. So even though ptr and ptr+0 are no doubt equal values, neither of them point to elements of any array object at all (not even a fictive one), nor one past the end of such an array object, and the condition of the cited phrase is not met. The consequence is (not that overflow is produced, but) that behavior is undefined.

So behavior is already undefined before the indirection operator * is applied. I would not argue for undefined behavior from the latter evaluation, even though the phrase "the result is an lvalue referring to the object or function to which the expression points" is hard to interpret for expressions that do not refer to any object at all. But I would be lenient in interpreting this, since I think dereferencing a pointer past an array should not itself be undefined behavior (for instance if used to initialise a reference).

This would suggest that if instead of ptr[0][0] one wrote (*ptr)[0] or **ptr, then behaviour would not be undefined. This is curious, but it would not be the first time the C++ standard surprises me.

OK. `*p` is in fact different to `*(p+0)` as you say; but then (also as you point out) `**ptr` would avoid that objection based on pointer arithmetic. ISTR that in C99 there was a clause that an lvalue must designate an object when it is evaluated (which would rule out `*ptr`) however that was changed for C11 because it also ruled out a bunch of what was meant to be legal behaviour. — M.M, Mar 25 '15 at 10:00

score 2 · Answer 5 · answered Mar 24 '15 at 22:24

2

It depends on what you mean by "correct". You are doing a cast on the ptr to arr[1]. In C++ this will probably be a reinterpret_cast. C and C++ are languages which (most of the time) assume that the programmer knows what he is doing. That this code is buggy has nothing to do with the fact that it is valid C/C++ code.

You are not violating any rules in the standards (as far as I can see).

answered Mar 24 '15 at 22:24

Otomo

880
7
12

1

My opinion also. If it segfaults, that's an OS issue, and aren't a lot of security hacks basically using pointers to access memory beyond declared array bounds? – jamesqf Mar 25 '15 at 05:05

score 0 · Answer 6 · edited May 23 '17 at 10:28

Trying to answer here why the code works on commonly used compilers:

int arr[2];

int (*ptr)[2] = (int (*)[2]) &arr[1];

printf("%p\n", (void*)ptr);
printf("%p\n", (void*)*ptr);
printf("%p\n", (void*)ptr[0]);

All lines print the same address on commonly used compilers. So, ptr is an object for which *ptr represents the same memory location as ptr on commonly used compilers and therefore ptr[0] is really a pointer to arr[1] and therefore arr[0][0] is arr[1]. So, the code assigns a value to arr[1].

Now, let's suppose a perverse implementation where a pointer to an array (NOTE: I'm saying pointer to an array, i.e. &arr which has the type int(*)[], not arr which means the same as &arr[0] and has the type int*) is the pointer to the second byte within the array. Then dereferencing ptr is the same as subtracting 1 from ptr using char* arithmetic. For structs and unions, it is guaranteed that pointer to such types is the same as pointer to the first element of such types, but in casting pointer to array into pointer no such guarantee was found for arrays (i.e. that pointer to an array would be the same as pointer to the first element of the array) and as a matter of fact @FUZxxl planned to file a defect report about the standard. For such a perverse implementation, *ptr i.e. ptr[0] would not be the same as &arr[1]. On RISC processors, it would as a matter of fact cause problems due to data alignment.

Some additional fun:

int arr[2] = {0, 0};
int *ptr = (int*)&arr;
ptr[0] = 5;
printf("%d\n", arr[0]);

Should that code work? It prints 5.

Even more fun:

int arr[2] = {0, 0};
int (*ptr)[3] = (int(*)[3])&arr;
ptr[0][0] = 6;
printf("%d\n", arr[0]);

Should this work? It prints 6.

This should obviously work:

int arr[2] = {0, 0};
int (*ptr)[2] = &arr;
ptr[0][0] = 7;
printf("%d\n", arr[0]);

There's no doubt that `ptr[0][0]` designates the same memory location as `arr[1]`; the question is whether we are allowed to access that memory location via `ptr`. [Here](https://stackoverflow.com/questions/25139579/2d-array-indexing-undefined-behavior) are some more examples of when an expression does designate the same memory location but it is not permitted to access the memory location that way. — M.M, Mar 24 '15 at 22:09
OK. The `5` example is also clearly correct (although I have seen someone argue that `ptr[1] = 5;` would be incorrect) — M.M, Mar 24 '15 at 22:16
The `6` example seems to be essentially the same as my question. Of course, compiler output is no guarantee of correctness. — M.M, Mar 24 '15 at 22:52
Well, to me it isn't apparent that the `5` example is correct. But perhaps I should consider posting another question at Stack Overflow then. — juhist, Mar 24 '15 at 22:55
OK. I have seen some people argue that strict aliasing prevents the `5` example although I disgaree — M.M, Mar 24 '15 at 22:56

Pointer-to-array overlapping end of array

6 Answers6

Linked