11

Have a look at is simple example:

struct Base { /* some virtual functions here */ };
struct A: Base { /* members, overridden virtual functions */ };
struct B: Base { /* members, overridden virtual functions */ };

void fn() {
    A a;
    Base *base = &a;
    B *b = reinterpret_cast<B *>(base);
    Base *x = b;
    // use x here, call virtual functions on it
}

Does this little snippet have Undefined Behavior?

The reinterpret_cast is well defined, it returns an unchanged value of base, just with the type of B *.

But I'm not sure about the Base *x = b; line. It uses b, which has a type of B *, but it actually points to an A object. And I'm not sure, whether x is a "proper" Base pointer, whether virtual functions can be called with it.

geza
  • 28,403
  • 6
  • 61
  • 135
  • I don't think the casting itself leads to UB, but attempting to use `b` to call virtual functions or use `B` only members will definitely lead to UB. Think `x` is safe though. – Some programmer dude Apr 05 '19 at 08:46
  • @Someprogrammerdude: yep, the question is, "Think `x` is safe though" true or not. I have a feeling, that while this seems harmless (it's a no-op) at the first sight, it is UB. – geza Apr 05 '19 at 08:48
  • `reinterpret_cast` cannot safely convert between base and derived class pointers/references. `b` is not guaranteed to be a valid pointer. The only safe thing you can do with it is reinterpret_cast it back to the original type. – n. m. could be an AI Apr 05 '19 at 09:20
  • @n.m.: what do you mean by "safe"? The standard defines the result of this `reinterpret_cast` pretty clearly. It's a valid pointer. So I don't think that problem is about `reinterpret_cast`. More likely the problem is, which ways can the result of this cast be used. – geza Apr 05 '19 at 09:24
  • @geza You cannot access an object through this pointer. I can show you a program that crashes doing so, with any mainstream implementation. – n. m. could be an AI Apr 05 '19 at 09:26
  • @n.m.: sure. But this snippet doesn't dereference `b`. At least not in a straightforward way (does a conversion involve a dereference?). Valid pointer doesn't mean that it can be dereferenced. For example, `nullptr` is valid, but it cannot be dereferenced. `b` points to `a`, as `reinterpret_cast` returns a value which is [unchanged](http://eel.is/c%2B%2Bdraft/expr.static.cast#13): "Otherwise, the pointer value is unchanged by the conversion." – geza Apr 05 '19 at 09:30
  • @geza You are "dereferencing" `b` through `x` (and the `b` expression value cannot be used to access the memory). I think that's still UB. – BiagioF Apr 05 '19 at 09:42
  • You cannot dereference x either. – n. m. could be an AI Apr 05 '19 at 09:47
  • @n.m. yeah, that's what I've said. However, I cannot find something on the standard which explicitly says that. – BiagioF Apr 05 '19 at 09:48
  • 1
    @n.m.: suppose that you convert back `b` with `reinterpret_cast`. It should give you a proper `Base` pointer. Now, that `reinterpret_cast` is nothing else, than a conversion to `void *`, then to `Base *`. My example code does something similar (it just doesn't have the conversion to `void *`, and the conversion to `Base *` is implicit, not through `static_cast`). Anyways, I'm just playing the devil's advocate here. I have an insight that the conversion is UB, but cannot backup this with the standard. – geza Apr 05 '19 at 09:50
  • That's exactly the problem. `static_cast` and `reinterpret_cast` are **different**. They do different things. If they were the same, why would we need both of them? – n. m. could be an AI Apr 05 '19 at 11:05
  • 1
    It seems that the line regarding implicit derived-to-base pointer conversion: `The result of the conversion is a pointer to the base class subobject of the derived class object.` (https://timsong-cpp.github.io/cppwp/conv.ptr#3), means that we have indeed dereferenced `b` and so hit UB. – Lawrence Apr 05 '19 at 15:38
  • UB can occur due to the fact that there is no description of a behavior. Derived to base implicit conversion (or even cast) makes sense iff there is a derived object (or a null ptr which is special cased) – curiousguy Apr 05 '19 at 17:55
  • @curiousguy: yep, the sentence Lawrence quoted implies that there a `B` object must exists at `b`. Because it doesn't exist, the implicit conversion is UB. – geza Apr 05 '19 at 20:50

4 Answers4

4

static_cast (or an implicit derived-to-base-pointer conversion, which does exactly the same thing) is substantially different from reinterpret_cast. There is no guarantee that that the base subobject starts at the same address as the complete object.

Most implementations place the first base subobject at the same address as the complete object, but of course even such implementations cannot place two different non-empty base subobjects at the same address. (An object with virtual functions is not empty). When the base subobject is not at the same address as the complete object, static_cast is not a no-op, it involves pointer adjustment.

There are implementations that never place even the first base subobject at the same address as the complete object. It is allowed to place the base subobject after all members of derived, for example. IIRC the Sun C++ compiler used to layout classes this way (don't know if it's still doing that). On such an implementation, this code is almost guaranteed to fail.

Similar code with B having more than one base will fail on many implementations. Example.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • You've just raised another problem :) While what you're saying is true, what if `Base *base = &a;` doesn't move the pointer (this is usually the case)? Or I put an `if` into my code, so it checks for equality: `if ((void*)base==(void*)&a)`, and only if that's true, then the code does the following things (reinterpret_cast + impl. conversion). Would it be still UB? – geza Apr 05 '19 at 11:54
  • As the other answer points out, `b` is not a safely derived pointer, and so neither is `x`. It is implementation defined wether unsafely derived pointers can be dereferenced. I have just clarified the reason why it isn't considered safely derived. – n. m. could be an AI Apr 05 '19 at 12:14
  • It's strange that the standard says "A pointer value is a safely-derived pointer to a **dynamic** object". I don't know why dynamic is there. But anyways, checking the [list](https://timsong-cpp.github.io/cppwp/n4659/basic.stc.dynamic.safety#2) under it, it seems that `b` is a safely-derived pointer, as it is a "the result of a reinterpret_­cast of a safely-derived pointer value" (I think we can consider `base` as a safely-derived pointer). – geza Apr 05 '19 at 12:28
  • Indeed it's my mistake, apparently safely derived means something other than I thought. I don't know whether this is UB with the additional conditions you provide. – n. m. could be an AI Apr 05 '19 at 12:45
  • If a polymorphic base doesn't the same address as the complete object, does that mean that the derived object gets another vptr even for simple inheritance (non virtual single inheritance)? – curiousguy Apr 05 '19 at 17:56
  • @geza In modern high level C++ a pointer isn't an address. See my questions like [Are pointer variables just integers with some operators or are they “symbolic”?](https://stackoverflow.com/q/32045888/963864) Checking for equality does nothing to make a pointer equal to another. – curiousguy Apr 05 '19 at 19:26
  • @curiousguy: `operator==` checks for address equality, which is the relevant comparison here in my opinion. – geza Apr 05 '19 at 19:31
  • @geza Read the discussion in my Q. Equality of addresses doesn't make the behavior well defined. – curiousguy Apr 05 '19 at 19:33
  • @curiousguy: n.m.'s answer is based on: if `base` doesn't have the same value as `&a`, then UB happens. That's right. But what if they do have the same address (that's a usual case, btw., it's true for all popular compilers)? If you say that it's still UB, please tell exactly why. I don't see any proof in that discussion. – geza Apr 05 '19 at 19:47
  • @geza You may want to see [this question of mine](https://stackoverflow.com/questions/47924103/pointer-interconvertibility-vs-having-the-same-address). – n. m. could be an AI Apr 05 '19 at 19:58
  • @geza See **all** my questions about ptrs in C/C++. They all discuss the same fundamental issue. Ptrs aren't trivial objects/POD types, you cannot measure the value of a ptr by looking at it or by any operation. Only a few constructs on ptr are blessed by the std. Note that the std absurdly still claims that they are trivial types, when it's clear they aren't. The only way to have a ptr to an object is a construct blessed like `&obj` or conversions of such value, or well defined arithm. `x==y` doesn't imply x and y have the same value – curiousguy Apr 05 '19 at 19:59
  • @n.m.: thanks (I've already upvoted your question a long time ago :) ), I've asked a similar question after you: https://stackoverflow.com/questions/51552713/can-stdlaunder-be-used-to-convert-an-object-pointer-to-its-enclosing-array-poi. I just cannot connect this information to my problem. – geza Apr 05 '19 at 20:12
  • @curiousguy: sure. I think I already understand C++17 pointer semantics well. Yet I don't find the exact part of the standard which makes my example UB. btw., `x==y` implies they have the same value, in the sense of addresses (which is the only relevant part here, I think). Except if one of the pointers is an end+1 pointer, and the other is not. – geza Apr 05 '19 at 20:15
  • @geza *which is the only relevant part here, I think* You are mistaken here. It is not relevant. – n. m. could be an AI Apr 05 '19 at 20:16
  • 1
    @geza this answer explains why `reinterpret_cast` is not a standard-blessed way to cast within a class hierarchy. If you want to ask whether one can exploit properties of object layout in a particular implementation in order to get more use of `reinterpret_cast` than permitted by the standard, you may want to ask a separate question (the answer will be a resounding "no"). – n. m. could be an AI Apr 05 '19 at 20:17
  • @n.m.: Yes, you're right that I need to put another question. But I'm not following you, why you say it's not relevant. Your whole answer depends on this. If the address is the same (`operator==` returns true), then your answer don't apply. – geza Apr 05 '19 at 20:20
  • @geza Again, my answer explains why `reinterpret_cast` within a class hierarchy is not a standard way to cast. However your question doesn't really need that explanation. The simple fact that **there is no B object anywhere in your program** is sufficient to prove it has UB. If you have a pointer of type B obtained by `reinterpret_cast`, *and* there's no object of type B at that address, the only thing you can do with the pointer is `reinterpret_cast` it back to the original type. (If there *is* an object of type B there, you **may or may not** be able to do more things with the pointer). – n. m. could be an AI Apr 05 '19 at 20:26
  • @n.m.: I'm not sure that the only thing I can do is `reinterpret_cast` back. Maybe `std::launder` can be used to make the pointer "alive" as well. I kinda think that Lawrence's comment under the question has the missing part for me: "The result of the conversion is a pointer to the base class subobject of the derived class object.". This implies that there is a `B` where `b` points. But actually, there isn't, so the standard doesn't specify `Base *x = b;` in my example, hence it's UB. I understand that you basically say the same thing, but this quote was the missing part for me. – geza Apr 05 '19 at 20:43
  • It's different if you cross an ABI boundary (extern function call) with a correct address value: bit patterns make sens; it really doesn't matter only you came up with a bit pattern (implicit conversion, `reinterpret_cast`, C code, asm) as an ABI isn't compiler specific. And even if you ABI-link code compiled with the exact same compiler, it's considered two independent pieces of work. OTOH if you use global optimization, the ABI doesn't apply, C++ semantics apply. One way to do an ABI interface is `volatile`: a write to a volatile qualified object must write the value as specified by the ABI. – curiousguy Apr 05 '19 at 23:27
  • @geza You mention `std::launder`: in any language that has a use of `std::launder`, it must be the case that `a==b` doesn't imply that `a` is equivalent to `b` (for two values that can be used as arguments to `std::launder`). IOW `std::launder` proves that ptrs aren't trivial objects. – curiousguy Apr 05 '19 at 23:40
  • @geza std::launder what? You cannot reinterpret cast and then launder as there's no B object at that address. You cannot reinterpret cast then implicitly convert then launder because you have UB before you try to launder because you cannot static_cast from something that is not there. – n. m. could be an AI Apr 08 '19 at 18:03
  • @n.m.: Launder came to my mind as a device, which makes a "proper" pointer from an address. But you're right, as launder cannot do any casts, we first have to cast it. So maybe you're right: in this use of `reinterpret_cast` (casting a pointer to another pointer), the result of it cannot be used anything useful. We can only do another `reinterpret_cast` on it (perhaps `reinterpret_cast` it back). – geza Apr 08 '19 at 18:35
1

The reinterpret_cast is valid (the result can be dereferenced) if the two classes are layout-compatible; that is

  • they both have standard layout,
  • they both have the same non-static data members

But the classes do not have standard layout because one of the requirements of StandardLayoutType it that the class has no virtual functions or virtual base classes.

Regarding the validity of pointers derived from conversions, the standard has this to say in the section on "Safely-derived pointers":

6.7.4.3 Safely-derived pointers

4. An implementation may have relaxed pointer safety, in which case the validity of a pointer value does not depend on whether it is a safely-derived pointer value. Alternatively, an implementation may have strict pointer safety, in which case a pointer value referring to an object with dynamic storage duration that is not a safely-derived pointer value is an invalid pointer value unless the referenced complete object has previously been declared reachable. [ Note: The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined, see 6.7.4.2. This is true even if the unsafely-derived pointer value might compare equal to some safely-derived pointer value. —end note ] It is implementation-defined whether an implementation has relaxed or strict pointer safety.

Community
  • 1
  • 1
P.W
  • 26,289
  • 6
  • 39
  • 76
  • However, the OP here does not directly access the memory with the result of `reinterpret_cast`. Another cast (`static_cast`) happens before the access. – BiagioF Apr 05 '19 at 09:51
  • But can another cast be used safely on the result of a cast that is invalid? – P.W Apr 05 '19 at 09:53
  • 1
    the result of the case is not "invalid". The value is still valid, but cannot be dereferenced (otherwise UB). (Same as `nullptr`, is valid value but cannot be dereferenced) – BiagioF Apr 05 '19 at 09:57
  • Still, the question remains. Can a "non-referenceable" be cast again into a "referenceable" one? – BiagioF Apr 05 '19 at 10:08
  • @BiagioFesta Yes with a cast back to the original type (X*->Y*->X*) – curiousguy Apr 05 '19 at 19:39
0

Yes, It does have undefined behavior. The layout about suboject of Base in A and B is undefined. x may be not a real Base oject.

water
  • 26
  • 5
0

If A and B are a verbatim copy of each other (except for their names) and are declared in the same context (same namespace, same #defines, no __LINE__ usage), then common C++ compilers (gcc, clang) will produce two binary representations which are fully interchangeable.

If A and B use the same method signatures but the bodies of corresponding methods differ, it is unsafe to cast A* to B* because the optimization pass in the compiler could for example partially inline the body of void B::method() at the call site b->method() while the programmer's assumption could be that b->method() will call A::method(). Therefore, as soon as the programmer uses an optimizing compiler the behavior of accessing A through type B* becomes undefined.

Problem: All compilers are always at least to some extent "optimizing" the source code passed to them, even at -O0. In cases of behavior not mandated by the C++ standard (that is: undefined behavior), the compiler's implicit assumptions - when all optimizations are turned off - might differ from programmer's assumptions. The implicit assumptions have been made by the developers of the compiler.

Conclusion: If the programmer is able to avoid using an optimizing compiler then it is safe to access A via B*. The only issue such a programmer needs to tackle with is that non-optimizing compilers do not exist.


A managed C++ implementation might abort the program when A* is casted to B* via reinterpret_cast, when b->field is accessed, or when b->method() is called. Some other managed C++ implementation might try harder to avoid a program crash and so it will resort to temporary duck typing when it sees the program accessing A via B*.

Some questions are:

  • Can the programmer guess what the managed C++ implementation will do in cases of behavior not mandated by the C++ standard?
  • What if the programmer sends the code to another programmer who will pass it to a different managed C++ implementation?
  • If a case isn't covered by the C++ standard, does it mean that a C++ implementation can choose to do anything it considers appropriate in order to cope with the case?
atomsymbol
  • 370
  • 8
  • 11
  • 1
    My question has the tag of `language-lawyer`. This means that it doesn't matter what compilers do. The question is, what the standard says. – geza Apr 08 '19 at 20:34
  • @geza Yes, although on the other hand a programmer is *always* accessing the C++ standard via a particular C++ implementation. From `language-lawyer` viewpoint, the only correct acceptable answer is just a single line: "The standard does not cover the case." - adding anything on top of that is completely superfluous from `language-lawyer` viewpoint and you shouldn't have accepted it as the best answer to your question. – atomsymbol Apr 09 '19 at 12:38