1

Suppose I have a structure:

struct Foo{
    char a[4];
    char b[4];
    //...some other members
} foo, foo2;

Lets assume no padding between a and b, so offsetof(Foo, a) == 0 and offsetof(Foo, b) == 4, and Foo is POD.

How standard defines the following?

putc(foo.a[4]); 
// Am I allowed to read past the `a` array?

foo.a[4] = 0;
// What is (foo.b[0]) ?

memcpy(foo.a, "ABCDEFGH", 8);
// Is foo.b equals "EFGH"?

memcpy(foo2.a, foo.a, 8);
// Is foo2.b equals foo.b?

In my understanding, if I start with the following legal expression:

*((char*)&foo + offsetof(Foo, b))

I can apply a series of substitutions:

*((char*)&foo + 4)
*((char*)&foo + 0 + 4)
*((char*)&foo + offsetof(Foo, a) + 4)
*(foo.a + 0 + 4)
*(foo.a + 4)
foo.a[4]

And other two are legal because foo.a is equivalent to (char*)&foo, and we can memcpy part of the structure if we are careful enough.

Community
  • 1
  • 1

1 Answers1

0

It is not permitted to read from or write to foo.a[4].

As specified in [dcl.array]/6,

An object of type "array of N U" consists of a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.

There is no element 4 in foo.a.

Now, [basic.compound]/3 does state that:

[...] For purposes of pointer arithmetic (7.6.6) and comparison (7.6.9, 7.6.10), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T. [...]

This makes the expression foo.a + 4 well-defined; it is a pointer to the "hypothetical" element with index 4. However, there is no actual such object, so there is nothing there to read from or write to.

You might object "there is an object there: foo.b[0]". However, this interpretation is contradicted by the pointer value taxonomy given in [basic.compound]/3:

[...] Every value of pointer type is one of the following:

  • a pointer to an object or function (the pointer is said to point to the object or function), or
  • a pointer past the end of an object (7.6.6), or
  • the null pointer value for that type, or
  • an invalid pointer value.

[...]

Because the pointer foo.a + 4 is a pointer past the end of foo.a, it cannot also be a pointer to foo.b[0], even though the address value they represent might be the same (assuming no padding between a and b). It can only fall into one of the four categories.

The standard has a wording gap as to whether it's actually legal to apply the unary * operator to a past-the-end pointer value. [expr.unary.op]/1 states that the unary * operator "yields an lvalue of type T denoting the object or function to which the operand points." If there is no such object, it is not clear whether the behaviour is well-defined; do you get an "lvalue past the end", or do you get UB? However, all known implementations treat it as not being UB (they allow it during constant expression evaluation). It is only when you try to read or write that the behaviour is undefined.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312