Is converting between pointer-to-T, array-of-T and pointer-to-array-of-T ever undefined behaviour?

Question

Consider the following code.

#include <stdio.h>
int main() {
 typedef int T;
 T a[] = { 1, 2, 3, 4, 5, 6 };
 T(*pa1)[6] = (T(*)[6])a;
 T(*pa2)[3][2] = (T(*)[3][2])a;
 T(*pa3)[1][2][3] = (T(*)[1][2][3])a;
 T *p = a;
 T *p1 = *pa1;
 //T *p2 = *pa2; //error in c++
 //T *p3 = *pa3; //error in c++
 T *p2 = **pa2;
 T *p3 = ***pa3;
 printf("%p %p %p %p %p %p %p\n", a, pa1, pa2, pa3, p, p1, p2, p3);
 printf("%d %d %d %d %d %d %d\n", a[5], (*pa1)[5], 
   (*pa2)[2][1], (*pa3)[0][1][2], p[5], p1[5], p2[5], p3[5]);
 return 0;
}

The above code compiles and runs in C, producing the expected results. All the pointer values are the same, as are all the int values. I think the result will be the same for any type T, but int is the easiest to work with.

I confessed to being initially surprised that dereferencing a pointer-to-array yields an identical pointer value, but on reflection I think that is merely the converse of the array-to-pointer decay we know and love.

[EDIT: The commented out lines trigger errors in C++ and warnings in C. I find the C standard vague on this point, but this is not the real question.]

In this question, it was claimed to be Undefined Behaviour, but I can't see it. Am I right?

Code here if you want to see it.

Right after I wrote the above it dawned on me that those errors are because there is only one level of pointer decay in C++. More dereferencing is needed!

 T *p2 = **pa2; //no error in c or c++
 T *p3 = ***pa3; //no error in c or c++

And before I managed to finish this edit, @AntonSavin provided the same answer. I have edited the code to reflect these changes.

`pa1` is a pointer to an array. If you dereference that, you get an array. Any array can be converted to a pointer to its first element. This applies to both `T* p = a;` and `T* p1 = *pa1;` Those are just ordinary array-to-pointer conversions, [conv.array]. — dyp, Aug 28 '14 at 01:18
1) dereferencing a pointer-to-array gives you an lvalue of type "array" which can then decay to a pointer. Dereferencing a pointer to array of array gives you an lvalue of type "array of array" which then decays to a pointer to array, and stops there. 2) The conversion itself is definitely not UB. The question is whether dereferencing the resulting pointer is UB. — T.C., Aug 28 '14 at 01:20
The lines that you marked as errors in C++ are errors in C as well for the very same reasons. It is that your C compiler is set up by default for less strict error checking. — AnT stands with Russia, Aug 28 '14 at 01:27
@dyp: But only one level deep. Hence the compile error in C++. — david.pfx, Aug 28 '14 at 01:39
@T.C.: Yes. I think arrays are always laid out in memory the same, so once you have a pointer the dereferencing must be OK. — david.pfx, Aug 28 '14 at 01:40
@AndreyT: No. They are errors or not. There is no switch that turns on and off errors. — david.pfx, Aug 28 '14 at 01:40
@david.pfx: Well, the language does not define such notion as "error". Again, both lines are illegal (aka "ill-formed" or "constraint violating") in C and C++ for the very same reasons. The compiler is required to issue a diagnostic message in response. C++ decided to issue an "error" message, while you C compiler issued a mere "warning". This is caused by a mere difference in the default compiler setup. In reality both messages have the same status in C and C++ and indicate illegal code. — AnT stands with Russia, Aug 28 '14 at 01:57
And yes, there's a switch that controls severity of compiler error messages. In GCC it is `-pedantic-errors`. I f you want GCC to report illegal code (i.e. "errors") as errors, you absolutely have to specify the `-pedantic-errors` switch. Without that flag the separation between "errors" and "warnings" made by GCC has nothing to do with the status of such violations in the formal language. — AnT stands with Russia, Aug 28 '14 at 01:59
I think this is an interesting question with a good answer. Why the downvote? — david.pfx, Aug 28 '14 at 09:50
@david.pfx *"But only one level deep"* A pointer to an array is not an array, hence no array-to-pointer conversion takes place. — dyp, Aug 28 '14 at 17:08

Anton Savin · Answer 1 · 2014-08-28T19:08:00.360

UPDATE: The following applies to C++ only, for C scroll down. In short, there's no UB in C++ and there is UB in C.

8.3.4/7 says:

A consistent rule is followed for multidimensional arrays. If E is an n-dimensional array of rank i x j x ... x k, then E appearing in an expression that is subject to the array-to-pointer conversion (4.2) is converted to a pointer to an (n - 1)-dimensional array with rank j x ... x k. If the * operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer, the result is the pointed-to (n - 1)-dimensional array, which itself is immediately converted into a pointer.

So this won't produce error in C++ (and will work as expected):

T *p2 = **pa2;
T *p3 = ***pa3;

Regarding whether this is UB or not. Consider the very first conversion:

T(*pa1)[6] = (T(*)[6])a;

In C++ it's in fact

T(*pa1)[6] = reinterpret_cast<T(*)[6]>(a);

And this is what the standard says about reinterpret_cast:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast< cv T2 * >(static_cast< cv void * >(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void.

So a is converted to pa1 through static_cast to void* and back. Static cast to void* is guaranteed to return the real address address of a as stated in 4.10/2:

A prvalue of type “pointer to cv T,” where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The result of converting a non-null pointer value of a pointer to object type to a “pointer to cv void” represents the address of the same byte in memory as the original pointer value.

Next static cast to T(*)[6] is again guaranteed to return the same address as stated in 5.2.9/13:

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer value is converted to the null pointer value of the destination type. If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A

So the pa1 is guaranteed point to the same byte in memory as a, and so access to data through it is perfectly valid because the alignment of arrays is the same as the alignment of underlying type.

What about C?

Consider again:

T(*pa1)[6] = (T(*)[6])a;

In C11 standard, 6.3.2.3/7 states the following:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

It means that unless the conversion is to char*, the value of converted pointer is not guaranteed to be equal to value of original pointer, resulting in undefined behavior when accessing data through converted pointer. In order to make it work, the conversion has to be done explicitly through void*:

T(*pa1)[6] = (T(*)[6])(void*)a;

Conversions back to T*

T *p = a;
T *p1 = *pa1;
T *p2 = **pa2;
T *p3 = ***pa3;

All of these are conversions from array of T to pointer to T, which are valid in both C++ and C, and no UB is triggered by accessing the data through converted pointers.

Thanks! (see my edit) I'm still puzzled why it's OK in C. Can you also answer the question: is there UB here? — david.pfx, Aug 28 '14 at 01:43
@david.pfx: It is not OK in C. It is an error in C as well. You apparently used ideone compiler, which deliberately suppresses some diagnostic messages. In other words, ideone is a broken compiler, whose diagnostic output (or lack thereof) cannot be meaningfully relied upon. Take a normal standalone GCC compiler and compile your code with it in C and C++ mode. You will see that your lines produce diagnostic messages in both C and C++ code. And if you ask the compiler to threat the standard requirements *pedantically*, the you will see that these lines are indeed errors in both C and C++. — AnT stands with Russia, Aug 28 '14 at 02:05
You are bolding the wrong sentence. All the types involved here are standard-layout, and have the same alignment requirements, so the result of the `reinterpret_cast` is well-defined as the double `static_cast` through `void *`. — T.C., Aug 28 '14 at 08:20
I think this is correct and I think it answers my question. No UB. — david.pfx, Aug 28 '14 at 09:48
What about strict aliasing rules? I'm not entirely sure that they don't cause UB in this case: the casts change the enclosing types of the `int`s, even though they don't change the fundamental type that is accessed. So, the question remains: is it UB to access an `int` within a 1D array via a 2D array pointer with different dimensions? — cmaster - reinstate monica, Aug 28 '14 at 17:29
@cmaster unlike C where you should explicitly cast through `void*` or `char*`, in C++ it's done automatically as I wrote in the answer. The pointers are guaranteed to point to the same location in memory. And the layout of arrays is defined in such a way which allows accessing through any array type. — Anton Savin, Aug 28 '14 at 18:07
@cmaster So it seems that in C the above code actually produces UB. — Anton Savin, Aug 28 '14 at 18:17
@cmaster I think the aliasing rules apply to modification and reading (i.e., lvalue-to-rvalue conversion). Neither happen here; and in general the array access `a[x]` is defined in terms of `*(a + x)`. The access always occurs on the level of individual elements. — dyp, Aug 28 '14 at 18:32
@dyp I've read C standard about pointer conversion. It doesn't guarantee that after conversion from pointer of one type to pointer to another type they will be the same. — Anton Savin, Aug 28 '14 at 18:49
Well, the UB due to strict aliasing rules does not come from the cast (although that's a requirement for the disaster), it comes from the freedom of the compiler *to rearrange reads and writes*. And it certainly has nothing to do with *how* you cast the pointer. An intermediate cast to `void*` does not help you. When you write `float foo = 42; int* bar = (void*)&foo; printf("%08x\n", *bar);` you may get the bit representation of 42 as a `float`, or you may not, simply because the compiler is not required to write `foo` to memory before it loads the value at `*bar`. — cmaster - reinstate monica, Aug 29 '14 at 16:18

AnT stands with Russia · Answer 2 · 2014-08-28T17:11:14.220

The only reason your code compiles in C is that your default compiler setup allows the compiler to implicitly perform some illegal pointer conversions. Formally, this is not allowed by C language. These lines

T *p2 = *pa2;
T *p3 = *pa3;

are ill-formed in C++ and produce constraint violations in C. In casual parlance, these lines are errors in both C and C++ languages.

Any self-respecting C compiler will issue (is actually required to issue) diagnostic messages for these constraint violations. GCC compiler, for one example, will issue "warnings" telling you that pointer types in the above initializations are incompatible. While "warnings" are perfectly sufficient to satisfy standard requirements, if you really want to use GCC compiler's ability to recognize constraint violating C code, you have to run it with -pedantic-errors switch and, preferably, explicitly select standard language version by using -std= switch.

In your experiment, C compiler performed these implicit conversions for you as a non-standard compiler extension. However, the fact that GCC compiler running under ideone front completely suppressed the corresponding warning messages (issued by the standalone GCC compiler even in its default configuration) means that ideone is a broken C compiler. Its diagnostic output cannot be meaningfully relied upon to tell valid C code from invalid one.

As for the conversion itself... It is not undefined behavior to perform this conversion. But it is undefined behavior to access array data through the converted pointers.

I have edited the code to remove this objection. While I do not accept your comments as authoritative, they also relate to a side issue. The main question is about UB. Do you still say that this is UB? — david.pfx, Aug 28 '14 at 06:11

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

This is a C-only answer.

C11 (n1570) 6.3.2.3 p7

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned^*) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

^*) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

The standard is a little vague what happens if we use such a pointer (strict aliasing aside) for anything else than converting it back, but the intent and wide-spread interpretation is that such pointers should compare equal (and have the same numerical value, e.g. they should also be equal when converted to uintptr_t), as an example, think about (void *)array == (void *)&array (converting to char * instead of void * is explicitly guaranteed to work).

T(*pa1)[6] = (T(*)[6])a;

This is fine, the pointer is correctly aligned (it’s the same pointer as &a).

T(*pa2)[3][2] = (T(*)[3][2])a; // (i)
T(*pa3)[1][2][3] = (T(*)[1][2][3])a; // (ii)

Iff T[6] has the same alignment requirements as T[3][2], and the same as T[1][2][3], (i), and (ii) are safe, respectively. To me, it sounds strange, that they couldn’t, but I cannot find a guarantee in the standard that they should have the same alignment requirements.

T *p = a; // safe, of course
T *p1 = *pa1; // *pa1 has type T[6], after lvalue conversion it's T*, OK
T *p2 = **pa2; // **pa2 has type T[2], or T* after conversion, OK
T *p3 = ***pa3; // ***pa3, has type T[3], T* after conversion, OK

Ignoring the UB caused by passing int * where printf expects void *, let’s look at the expressions in the arguments for the next printf, first the defined ones:

a[5] // OK, of course
(*pa1)[5]
(*pa2)[2][1]
(*pa3)[0][1][2]
p[5] // same as a[5]
p1[5]

Note, that strict aliasing isn’t a problem here, no wrongly-typed lvalue is involved, and we access T as T.

The following expressions depend on the interpretation of out-of-bounds pointer arithmetic, the more relaxed interpretation (allowing container_of, array flattening, the “struct hack” with char[], etc.) allows them as well; the stricter interpretation (allowing a reliable run-time bounds-checking implementation for pointer arithmetic and dereferencing, but disallowing container_of, array flattening (but not necessarily array “lifting”, what you did), the struct hack, etc.) renders them undefined:

p2[5] // UB, p2 points to the first element of a T[2] array
p3[5] // UB, p3 points to the first element of a T[3] array

Is converting between pointer-to-T, array-of-T and pointer-to-array-of-T ever undefined behaviour?

3 Answers3

Linked