4

Consider this union:

union A{
  int a;
  struct{
    int b;
    } c;
  };

c and a are not layout-compatibles types so it is not possible to read the value of b through a:

A x;
x.c.b=10;
x.a+x.a; //undefined behaviour (UB)

Trial 1

For the case below I think that since C++17, I also get an undefined behavior:

A x;
x.a=10;
auto p = &x.a; //(1)
x.c.b=12;      //(2)
*p+*p;         //(3) UB

Let's consider [basic.type]/3:

Every value of pointer type is one of the following:

  • a pointer to an object or function (the pointer is said to point to the object or function), or
  • a pointer past the end of an object ([expr.add]), or
  • the null pointer value ([conv.ptr]) for that type, or
  • an invalid pointer value.

Let's call this 4 pointer values categories as pointer value genre.

The value of a pointer may transition from of the above mentioned genre to an other, but the standard is not really explicit about that. Fill free to correct me if I am wrong. So I suppose that at (1) the value of p is a pointer to value. Then in (2) a life ends and the value of p becomes an invalid pointer value. So in (3) I get UB because I try to access the value of an object (a) out of its lifetime.

Trial 2

Now consider this weird code:

A x;
x.a=10;
auto p = &x.a;                 //(1)
x.c.b=12;                      //(2)
p = reinterpret_cast<int*>(p); //(2')
*p+*p;                         //(3) UB?

Could the reinterpret_cast<int*>(p) change the pointer value genre from invalid pointer value to a pointer to value.

reinterpret_cast<int*>(p) is defined to be equivalent to static_cast<int*>(static_cast<void*>(p)), so let's consider how is defined the static_cast from void* to int*, [expr.static.cast]/13:

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

So in our case the original pointer pointed to the object a. So I suppose the reinterpret_cast will not help because a is not within its lifetime. Is my reading to strict? Could this code be well defined?

Community
  • 1
  • 1
Oliv
  • 17,610
  • 1
  • 29
  • 72
  • 1
    Possible duplicate of [3 trials to access an active union member by using the value of a pointer to an inactive member](https://stackoverflow.com/questions/56307738/3-trials-to-access-an-active-union-member-by-using-the-value-of-a-pointer-to-an) – Retired Ninja May 25 '19 at 18:51
  • 6
    Ending the lifetime of the pointee does not make the pointer value invalid. – T.C. May 25 '19 at 18:53
  • @T.C. Is it, this is my answer to the two questions I have just posted. I don't know when does the pointer value genre change? – Oliv May 25 '19 at 18:56
  • 1
    @Evg RetiredNinja linked to a now deleted question you don't have the rights to see. – Swordfish May 25 '19 at 19:06
  • @Evg Gain 10000 rep to see. – πάντα ῥεῖ May 25 '19 at 19:06
  • 1
    @Oliv Probably I am not completely grasping what you're asking, but from my guts it's never possible to fix something that's already invalid by using a cast (any kind of). You'll be entering the land of _undefined behavior_. – πάντα ῥεῖ May 25 '19 at 19:10
  • @πάνταῥεῖ, wilco. – Evg May 25 '19 at 20:12
  • 1
    @Oliv: `int *p = new int; delete p;` or `int *p; { int a; p = &a; }`. In both examples, `p` ends up having invalid pointer value. But changing the active union member doesn't invalidate pointers to the previously-active-and-now-inactive member, as the storage still exists. – geza May 25 '19 at 21:04
  • Accessing the inactive union member is undefined behavior in C++, see [Accessing inactive union member and undefined behavior?](https://stackoverflow.com/q/11373203/608639) I don't believe you get to the point you can use the pointer your cast. You stumble and fall earlier in the process. – jww May 26 '19 at 00:01
  • @Oliv A little bit offtopic, but these language-lawyers discussions always intrigue me. May I ask on what kind of library are you working that you need to consider such low level concepts like union members lifetimes? Only thing I can think of find is in hand-written `variant` but whats wrong with `std` one then? – R2RT May 26 '19 at 09:32
  • @R2RT I am working on representation theory, the way the langage is specified is my main interest. – Oliv May 26 '19 at 10:00
  • @jww: Accessing an inactive union member is forbidden *because it breaks strict aliasing*. If you are in one of the rare cases where strict aliasing isn't broken, then it is allowed (for example, writing any union member and then reading a character type is allowed because strict aliasing allows use of a character lvalue no matter the true dynamic type). – Ben Voigt May 26 '19 at 13:56

3 Answers3

2

Then in (2) a life ends and the value of p becomes an invalid pointer value.

Incorrect. Pointers only become invalid when they point into memory that has ended its storage duration.

The pointer in this case becomes a pointer to an object outside of its lifetime. The object it points to is gone, but the pointer is not "invalid" in the way the specification means it. [basic.life] spends quite a bit of time explaining what you can and cannot do to pointers to objects outside of their lifetime.

reinterpret_cast cannot turn a pointer to an object outside of its lifetime into a pointer to a different object that is within its lifetime.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 2
    _reinterpret_cast cannot turn a pointer to an object outside of its lifetime into a pointer to a different object that is within its lifetime_ So your opinion that "there **is** an object" here https://timsong-cpp.github.io/cppwp/n4659/expr.static.cast#13 means that an object shall be within its lifetime? – Language Lawyer May 26 '19 at 01:13
  • 3
    @NicolBolas: ["An object is created ... when implicitly changing the active member of a union"](http://eel.is/c++draft/intro.object#1) so wouldn't the "replacement with a new object at the same address" rule kick in, rendering the existing pointer already capable of referring to the new object? – Ben Voigt May 26 '19 at 04:12
  • So `*p+*p` are UB in the two cases? This is what you say. – Oliv May 26 '19 at 06:59
  • If the pointer value is not *invalid* when the object life time ends, so it is a *pointer to* an object [basic.type/3.1](https://timsong-cpp.github.io/cppwp/n4659/basic.types#basic.compound-3.1). But there are no object. I think there is a hole in the standard. A pointer value can also represent a storage address. This is the fith pointer value genre with (*pointer to*,*pointer past the end*,*null pointer value*,*invalid pointer value*) – Oliv May 26 '19 at 07:03
  • I have just found this paragraph that half satisfy me: [basic.stc/4](https://timsong-cpp.github.io/cppwp/n4659/basic.stc#4). So we can infer that before the storage duration ends, the pointer value is not invalid... – Oliv May 26 '19 at 07:38
0

The notion of objects in the standard is rather abstract and differs somewhat from intuition. An object may be within its lifetime or not, and objects not within their lifetimes can have the same address, this is why unions work at all: the definition of active member is "the member that is within its lifetime".

A pointer to an object not within its lifetime is still a pointer to object. reinterpret_cast only casts between the type of the pointer, but not its validity. The UB you get with casting to non-pointer-interconvertible types are due to the strict-aliasing rule, not due to the validity of the pointer.

In all your trials, including your follow up question, you are using an object not within its lifetime in ways that aren't allowed, ie accessing it, and are consequently UB.

Passer By
  • 19,325
  • 6
  • 49
  • 96
  • So you think the argument of BenVoigt in its answer to the other question does not hold? – Oliv May 27 '19 at 13:36
  • For the second paragraph I don't agree, there is no violation of [strict-aliasing rule](http://eel.is/c++draft/basic.lval#11) (this is new version removed a core language issue). – Oliv May 27 '19 at 13:47
  • A pointer to an object not within its lifetime is still a pointer to object. I don't find documentation for that. This is my point. The only thing I have found is that the pointer is not an invalid pointer: [basic.stc/4](https://timsong-cpp.github.io/cppwp/n4659/basic.stc#4) – Oliv May 27 '19 at 13:49
  • "A pointer to an object not within its lifetime is still a pointer to object"... It has a pointer to object type, I agree that does not change. But is its value still a "pointer to" value? If what I just wrote seems to you weird, read carefully [basic.type]/3 linked in the question above. – Oliv May 27 '19 at 13:55
  • @Oliv The strict-aliasing rule violation I mentioned was to clarify that casting pointers around is usually UB through that, not through pointer validity. _"The end of duration for storage"_ isn't _"end of lifetime"_, you get that only when the storage itself is gone, ie automatic variables going out of scope or deleting dynamic stuff. – Passer By May 27 '19 at 14:12
  • Did you read BenVoigt answer. Its interpretation is that when the active member of union is A then is changed to active member B, one an consider that B reuse the storage of member A so that the rule in basic.life applies and the pointer to A becomes a pointer to B. Formaly that looks correct, but I fill this was not the intent of the poeple who wrote the standard. – Oliv May 27 '19 at 15:29
  • I suppose the reinterpret cast is not UB thanks to [basic.life/6.4](http://eel.is/c++draft/basic.life#6.4). – Oliv May 27 '19 at 15:34
  • @Oliv I did read his answer and I disagree. That applies to cases where you do `new (&i) int` and `i` now refers to the newly created `int`. – Passer By May 28 '19 at 03:02
  • The notion that creating a union of PODS doesn't simultaneously create all the objects in it causes needless confusion and ambiguity compared with simply requiring that conflicting accesses to different objects within a union must be separated by actions which would suggest a relationship between them. Unfortunately, people have become so invested in that model that I don't foresee the ability to usefully employ unions ever being anything more than a "popular extension" that can be enabled via `-fstrict-aliasing` or similar options. – supercat May 28 '19 at 21:22
0

Every version to date of the C and C++ Standards has been ambiguous or contradictory with regard to what can be done with addresses of union members. The authors of the C Standard didn't want to require that compilers make pessimistic allowances for the possibility that functions might be invoked by constructs like:

someFunction(&myUnion.member1, &myUnion.member2);

in cases where function would cause the value one member of myUnion would be changed between access made via the other. While the ability to take union members' addresses would have been pretty useless if code couldn't do things like:

someFunction1(&myUnion.member1);
someFunction2(&myUnion.member2);
someFunction3(&myUnion.member1);

the authors of the Standard expected that quality implementations intended for various purposes would process constructs that Undefined Behavior "in a documented fashion characteristic of the environment" when doing so would best serve those purposes, and thus thought that making support for such constructs be a quality-of-implementation issue would be simpler than trying to formulate precise rules for which patterns must be supported. A compiler that generated code for the called functions in the second example without knowing their calling context wouldn't be able to interleave accesses performed by the two functions, and a quality compiler that expanded them inline while processing the above code would have no trouble noticing when each pointer was derived from myUnion.

The authors of the C89 Standard didn't think it necessary to define precise rules for how pointers to union members behave, because they thought compiler writers' desire to produce quality implementations would drive them to handle appropriate cases sensibly even without such rules. Unfortunately, some compiler writers were too lazy to handle cases like the second example above, and rather than recognizing that there was never any reason for quality compilers to be incapable of handling such cases, the authors of later C and C++ Standards have bent over backward to come up with weirdly contorted, ambiguous, and contradictory rules that justify such compiler behavior.

As a result, the address-of operator should only be regarded as meaningfully applicable to union members in cases where the resulting pointer will be used for accessing individual bytes of storage, either using character-types directly, or passing to functions like memcpy that are defined in such fashion. Unless or until there's a major revamp of the Standard, or an appendix that describes means by which implementations can offer optional guarantees beyond what the Standard requires, it would be best to pretend that union members are--like bitfields--lvalues that don't have addresses.

supercat
  • 77,689
  • 9
  • 166
  • 211