10

Based on this Question ( strange output issue in c) there was an Answer ( provided by @Lundin ) about this line:

int *ptr = (int*)(&a+1);

where he said:

the cast (int*) was hiding this bug.

So I came with the following:

#include <stdio.h>

int main( void ){
    int a[5] = {1,2,3,4,5};

    int *ptr = *( ( &a ) + 1 );
    printf("%d", *(ptr-1) );
}

I would like to know if this:

int *ptr = *( ( &a ) + 1 );

Is well-defined by the Standard?

EDIT:

At some point @chux pointed to §6.3.2.3.7 which is:

A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned68) for the referenced type, the behavior is
undefined. Otherwise, when converted back again, the result shall compare equal to the
original pointer. When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.

But I am not sure if I understand it right.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Michi
  • 5,175
  • 7
  • 33
  • 58
  • The types don't seem to match – Eugene Sh. Jun 26 '18 at 17:50
  • @EugeneSh. The types do not need to match, just align. (C11 §6.3.2.3 7) – chux - Reinstate Monica Jun 26 '18 at 17:52
  • Anyway, it is dereferencing an invalid pointer, isn't it? – Eugene Sh. Jun 26 '18 at 17:53
  • 1
    @EugeneSh. It moves back 1 in the print statement bringing it back to the end of the array, which should be valid. – Christian Gibbons Jun 26 '18 at 17:55
  • 1
    `( &a ) + 1` - is pointing past the array. Then dereferenced and assigned to `ptr`. – Eugene Sh. Jun 26 '18 at 17:56
  • 1
    @EugeneSh. `&a` has type `int (*a)[5]`, so `*((&a) + 1)` has type `int *`, and points to the address one past the end of the array. – user3386109 Jun 26 '18 at 17:57
  • @EugeneSh. Ahhh, I thought `int *ptr = *( ( &a ) + 1 );` was `int *ptr = (int*)(&a+1);` as [here](https://stackoverflow.com/q/51043235/2410359). Yes it is a problem or at least a concern, hmmmm. – chux - Reinstate Monica Jun 26 '18 at 17:59
  • 1
    @user3386109 `(&a) + 1` has a type `(*a)[5]` and is pointing past the array. `*((&a) + 1)` is dereferencing it. – Eugene Sh. Jun 26 '18 at 17:59
  • @EugeneSh. It's dereferencing a pointer-to-a-pointer-to-int, yielding a pointer-to-int. – user3386109 Jun 26 '18 at 18:03
  • @EugeneSh. So pointing past the array is `UB` even if you do not use it? in `printf` call we have `*(ptr-1)`. – Michi Jun 26 '18 at 18:04
  • But you do use it. At least that's my understanding. You assign the value pointed by it to `ptr`. – Eugene Sh. Jun 26 '18 at 18:05
  • @EugeneSh. So you are saying that the program is `UB` ? – Michi Jun 26 '18 at 18:05
  • 1
    @Michi Making a pointer "1 past" the address of an object is not a problem. Pointer math is well defined. How that pointer might get used is an issue, especially if it is de-referenced, as here. – chux - Reinstate Monica Jun 26 '18 at 18:07
  • @chux If pointer math is OK, than this Program is well-defined. in `printf()` I used `*(ptr-1)` which should be fine, I think. – Michi Jun 26 '18 at 18:08
  • What exactly are we getting when we dereference `((&a) + 1)`, though? Because it doesn't seem to be `a[5]` (which does not exist), but rather a pointer to `a[5]`. – Christian Gibbons Jun 26 '18 at 18:08
  • @EugeneSh. I guess I see your point. The question is whether "yielding a pointer-to-int", is separate and distinct from "dereferencing a pointer-to-a-pointer-to-int". I contend that all the code does is compute an address, and isn't dereferencing anything. – user3386109 Jun 26 '18 at 18:11
  • @Michi `*(ptr-1)` is not the tricky bit. `*( ( &a ) + 1 )` is the crux of the issue. – chux - Reinstate Monica Jun 26 '18 at 18:11
  • @chux Well I need to know if this line `int *ptr = *( ( &a ) + 1 );` is `UB` or not. More over the whole Program. – Michi Jun 26 '18 at 18:12
  • Can we agree that `int *ptr = *(&a);` returns a pointer to the first element of `a`? – Christian Gibbons Jun 26 '18 at 18:15
  • 1
    IMO `*( ( &a ) + 1 )` is UB because although "1 past" `&a[6]` is well defined. `( &a ) + 1` is a pointer to a whole [array 5](https://cdecl.org/?q=int+a%5B5%5D) passed `a[]`. Hmmmm - I just do not see de-referencing any pointer in that "1 past" zone as legit. The general idea is to think of `a[]` existing near the end of memory - how much more pointer math is allowed? – chux - Reinstate Monica Jun 26 '18 at 18:16
  • 1
    @chux But that address is still one past the end of an array. The array just happens to only have one element. For example, given an array `int b[1][5]`, the code is equivalent to `int *ptr = b[1]`. – user3386109 Jun 26 '18 at 18:22
  • @user3386109 Fair point. – chux - Reinstate Monica Jun 26 '18 at 18:32

3 Answers3

6

This expression invokes undefined behavior as a result of the dereference operator *:

int *ptr = *( ( &a ) + 1 );

First, let's start with ( &a ) + 1. This part is valid. &a has type int (*)[5], i.e. a pointer to an array of size 5. Performing pointer arithmetic by adding 1 is valid, even though a is not an element of an array.

In section 6.5.6 of the C standard detailing Additive Operators, paragraph 7 states:

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

It's also allowed to create a pointer that points to one element past the end of an array. So &a + 1 is allowed.

The problem is when we dereference this expression. Paragraph 8 states:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Since dereferencing a pointer to one past the end of an array is not allowed, the beahvior is undefined.

Going back to the expression in the referenced post:

int *ptr = (int*)(&a+1);
printf("%d %d", *(a+1), *(ptr-1));

This is also undefined behavior but for a different reason. In this case, a int (*)[5] is converted to int * and the converted value is subsequently used. The only case where using such a converted value is legal is when converting an object pointer to a pointer to a character type, e.g. char * or unsigned char * and subsequently dereferenced to read the bytes of the object's representation.

EDIT:

It seems the two lines above are actually well defined. At the time the pointer dereference *(ptr-1) occurs, the object being accessed has effective type int, which matches the dereferenced type of ptr-1. Casting the pointer value &a+1 from int (*)[5] to int * is valid, and performing pointer arithmetic on the casted pointer value is also valid because it points either inside of a or one element past it.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • The second quote ends with *"that is evaluated"*. To understand what that means, see section 6.5.3.2/3. – user3386109 Jun 26 '18 at 19:12
  • 2
    @user3386109 The `*` operator is evaluated in this case since it is not directly the operand of a `&` operator. – dbush Jun 26 '18 at 19:17
  • Concerning `int *ptr = (int*)(&a+1)` Disagree about "only case where such a conversion is legal is when converting an object pointer to a pointer to a character type". Instead "A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined." C11 §6.3.2.3 7. As alignment is not an issue, the conversion is well defined. – chux - Reinstate Monica Jun 26 '18 at 19:57
  • @chux While the conversion itself is fine, it's using the converted value that's an issue, i.e. dereferencing the converted pointer. So while the assignment to `ptr` is OK, subsequently reading `*(ptr-1)` is not. Edited to clarify. – dbush Jun 26 '18 at 20:01
  • @dbush As `int *ptr = (int*)(&a+1);` is an acceptable conversion and `ptr-1` is also good, code does not attempt to dereference with `*(ptr-0)` but with `*(ptr-1)`. I respectfully disagree and see this as well defined. `*ptr` is UB. Perhaps AA concerns are of concern, yet there is no changing of underlying data here. – chux - Reinstate Monica Jun 26 '18 at 20:13
  • Disagree with your assessment of `int *ptr = (int*)(&a+1);`; your interpretation would make `int * x = malloc(sizeof(int)); x[0] = 0;` UB too. [See here](https://stackoverflow.com/a/29245267/1505939) for expanded discussion – M.M Jun 26 '18 at 21:23
  • 1
    The `malloc` case is OK because the returned memory does not have an effective type until it is assigned to. Actually, now that I think about it I agree with you and @chux. The conversion is valid and `ptr` points to one element past the end of `a`, the subsequent pointer arithmetic is valid because the result points inside of `a`, and the dereference is valid because the effective type of the accessed object (i.e. `int`) matches the dereferenced pointer type. – dbush Jun 27 '18 at 00:23
3

*( ( &a ) + 1 ) is UB due to

... If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated. C11 §6.5.6 8

( &a ) + 1 points to "one past". Using * on that goes against "shall not".

int a[5] = {1,2,3,4,5};
int *ptr = *( ( &a ) + 1 );

Even if a was int a this applies due to

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type. §6.5.6 7

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • If `a` was `int a`, `ptr` would have to be `int`, which would make it clearly UB; so I don't think the second part of the answer is needed. – Acorn Jun 26 '18 at 18:44
  • @Acorn Assigning a pointer to an `int` has its own concerns 6.3.2.3 5. Yet the UB here is present on `*( ( &a ) + 1 )` alone. – chux - Reinstate Monica Jun 26 '18 at 18:49
  • I'm not sure if this directly applies since it is not really a pointer to one past the end of the array. Dereferencing it is what makes it a pointer to one past the end of the array. – Christian Gibbons Jun 26 '18 at 18:49
  • 2
    @ChristianGibbons `&a` is a pointer. `( &a ) + 1` is "one past" and is a pointer. §6.5.6 8 clearly addresses using `*` on that as with `*( ( &a ) + 1 )`. – chux - Reinstate Monica Jun 26 '18 at 18:51
1

int *ptr = *( ( &a ) + 1 ); is invoked undefined behaviour.

C11 - §6.5.6 "Additive operators" (P8) :

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.[...]

msc
  • 33,420
  • 29
  • 119
  • 214