Is moving a pointer to past a struct member UB? And accessing it?

Question

Look at this snippet:

struct S {
    float x, y, z;
};

void fn() {
    S s = { 0, 0, 0 };
    float *p = &s.x;
    p += 2;           // 1.
    if constexpr(sizeof(S)==sizeof(float)*3) { // if S has no padding
        float t = *p; // 2.
        s.z = 1;
        float z = *p; // 3.
    }
}

My questions are:

Is p += 2; UB? (i.e., p is moved two elements beyond from s.x, so it points beyond &s.x+1)
Here, we know that S doesn't have padding, is float t = *p; UB? Or is it well defined, that t should contain the value of s.z?
Can an optimizer optimize access to p at float z = *p;? I mean, is it allowed to z be 0? (is it allowed for a compiler to fail to see, that p==&s.z?)

Does the answer differ for 2. and 3., if the if constexpr is not there, but we know (maybe from the compiler documentation, or from previous experience), that there is no padding in S?

If 1. is UB (so 2./3. meaningless), then what's the answer to 2./3., if p is set like this (p is moved with the help of an array, but otherwise, the snippet is the same)?

union U {
    S s;
    float a[3];
};

void fn() {
    U u;
    u.s.x = 0; u.s.y = 0; u.s.z = 0;
    float *p = u.a;  // here, p==&u.s.x as well
    if constexpr(sizeof(S)==sizeof(float)*3) { // if S has no padding
        p += 2;
        float t = *p; // 2.
        u.s.z = 1;
        float z = *p; // 3.
    }
}

Just calculating a pointer value is never UB. Dereferncing is if you calculated something unspecified. — user0042, Nov 25 '17 at 20:34
I think you need to read up what undefined behavior actually is and how it works. By then you'll understand why this isn't undefined behavior. — Bauss, Nov 25 '17 at 20:37
@user0042: Per C++ clause 8.7, paragraph 4, the behavior of adding values to pointers is explicitly undefined except for additions that move a pointer around within an array (including pointing to one beyond the last element but no further). For the purposes of this clause, a single object is considered an array of one object. By “explicitly undefined,” I mean the standard does not just leave this open. It literally says “otherwise, the behavior is undefined.” — Eric Postpischil, Nov 25 '17 at 20:41
For the second part: When you write to `u.s` it becomes the active member, and `u.a` is no longer there. Accessing `u.a` through a pointer cannot be valid if `u.a` is not present. Also, if you had asked this for C, the answer would likely have been different. — Bo Persson, Nov 25 '17 at 21:03
@Bauss: It is undefined behavior. See the answer, and my comment above. If you believe otherwise, tell us which statements in the standard define it. — Eric Postpischil, Nov 25 '17 at 22:58

Stephan Lechner · Accepted Answer · 2017-11-25T22:34:34.967

Statement p += 2 on its own is undefined behaviour; p is a pointer to a float-object, and it points to a single float object (not to an array of those). Though a single object is - in terms of pointer arithmetics - considered as an array consisting one element (cf., for example, 5.7 (4) of this online standard draft), you move the pointer two past the end. This pointer arithmetics per se is already UB (cf. 5.7 (5)), regardless of whether you dereference the pointer then or not.

Note that - even if you declare three consecutive members of type float, and even if the compiler does not introduce padding in between, neither the first member nor the complete struct-object will become an array in terms of the standard. And even if the memory layout we think of might be "compatible" to the case we liked to access it, the compiler is not enforced to allow/translate statements containing UB in any sense we think of.

So to directly answer your question:

(1) is UB due to invalid pointer arithemtics

(2) is UB due to accessing an invalid pointer

(3) is UB due to accessing an invalid pointer, and therefore any question about whether the compiler may optimize or not cannot be answered / does not make sense.

Concerning the union-construct, in C++ (unlike in C), accessing a member of a union other than the one previously written is again UB. So writing union member s and then accessing union member a again leads to UB (though due to a different reason now).

Thanks, I've just added a supplement to my question, sorry for the late edit. — geza, Nov 25 '17 at 20:54
Sorry, did not "reload" and so saw the new section in the question just now. See edited answer. — Stephan Lechner, Nov 25 '17 at 21:14

score 0 · Answer 2 · answered Nov 26 '17 at 15:34

0

This is an instance of the "what is an object exactly?" argument, which hinges on ambiguous language in the standard and has never been resolved to anyone's satisfaction. Stephan Lechner's answer is correct on one reading of the ambiguous language, in which each float field within the structure is an object in itself. However, the standard can also be read to say that the "object" is the entire struct, in which case the pointer arithmetic and dereference are perfectly valid.

A strong argument for the "object is the entire struct" interpretation is that the pointer arithmetic in the question is isomorphic to

#include <stddef.h>
struct S { float a, b, c; };

void fn(S *sp)
{
    return *(float *)(((char *)sp) + offsetof(S, c));
}

which had better be valid or dozens of real programs will break.

(It gets even messier when a chunk of memory has no "declared type", e.g. when allocated with malloc.)

answered Nov 26 '17 at 15:34

zwol

135,547
38
252
361

That's what I thought first. But as I understand, the standard only allows pointer arithmetic when the pointer points to an actual array. http://eel.is/c++draft/expr.add#4. – geza Nov 26 '17 at 17:16
relevant question: https://stackoverflow.com/questions/47498585/is-adding-to-a-char-pointer-ub-when-it-doesnt-actually-point-to-a-char-arr – geza Nov 26 '17 at 17:31
@geza Right, but that has to be interpreted in a way that allows `offsetof` to work (on POD types). And it's not _just_ that any object can be treated as an array of `char`, because it has to be legitimate to cast back to the type of the field at the offset. I really don't see a way to allow `offsetof` without also allowing the OP's code. – zwol Nov 26 '17 at 17:32
hmm, why do you think that if `offsetof` is OK, then the example in the question should be OK as well? – geza Nov 26 '17 at 17:35
@geza It's not that I think the example in the question should be OK. It's that I don't see a way to write the C++ standard such that `offsetof` is OK but the example in the question is not OK, and vice versa. – zwol Nov 26 '17 at 17:53
They could add an explicit exception for `char *`, for which arithmetic operations allowed (no matter, it is an actual array, or not), if it doesn't "point out" from a full-object. For me, this would be a sensible thing to do. – geza Nov 26 '17 at 17:56

Is moving a pointer to past a struct member UB? And accessing it?

2 Answers2

Linked