3

For a certain class F, its pointer (created via new F()) can be up-cast to a base's class pointer e.g. to B*,C*,D* and E*.

enter image description here

Is it guaranteed that for a certain compiler (a certain configuration and a certain .exe program), difference of up-cast address (in bytes) of a pair of any mentioned classes (e.g. select 2 among B*,C*,D* and E*) of every instance of new F() is a constant?

For example, this MCVE print 8 all 10 times for me :-

#include <iostream>
class B{ public: virtual ~B(){} };
class D: virtual public B{};
class C: virtual public B{};
class E: virtual public D{};
class F: virtual public E, virtual public C{};
int main(){
    for(int n=0;n<10;n++){
        F* f = new F();
        C* c=f;
        E* e=f;
        int offset=
            (reinterpret_cast<uintptr_t>(c))
            -
            (reinterpret_cast<uintptr_t>(e));
        std::cout<<offset<<std::endl;
    }
}

I believe the answer is yes, because I can static_cast it.
(and I shamefully rely on this assumption unconsciously for a long time.)

Different compiler may print different offset, but I care only a case of a certain (same) program and a certain (same) true underlying class ( new F()).

I wish the value to always be constant in every situation. If so, I don't have to fix my program.

I would be glad if answer also quote some C++ specification.

Edit: Fix inconsistency in my code (Thank curiousguy's comment)

javaLover
  • 6,347
  • 2
  • 22
  • 67
  • Possible duplicate of [memory layout C++ objects](https://stackoverflow.com/questions/1632600/memory-layout-c-objects) – apple apple May 08 '19 at 03:32
  • @apple apple In real case, I want to cast between `C*` & `E*` easily - they are both refereed as `void*` in my program, and I always know the true underlying class. The custom cast happens in a far away `.cpp` that `C` & `E` are undefined types. – javaLover May 08 '19 at 03:33
  • a `void*`? it may be better `B*` and use `dynamic_cast`. – apple apple May 08 '19 at 03:40
  • @apple apple I agree that `dynamic_cast` is usually better. In my situation, it is not much possible through. The class that want to cast is low-engine-layer, while `C` & `E` is defined user-layer. Furthermore, unfortunately, the custom casting is also in CPU-critical-path. – javaLover May 08 '19 at 03:47
  • There isn't even a guarantee that if a base class subobject has no virtual base, then inside a c/d-tor its representation is exactly the one of a complete object. There is no offsetof for base classes (virtual or non virtual). You can't do arithmetic AFAIK. – curiousguy May 09 '19 at 00:07
  • Your Q about layout is OK (w/o the code example) and but then the code is meaningless: **none of your `static_cast` to ptr type is useful**. Just remove them. What do you think `static_cast` means? is the problem here. Please explain your argument re: `static_cast`. – curiousguy May 09 '19 at 00:10
  • **`F` is not virtual derived from `C`** so either code is in error or the graphic inheritance rep is. – curiousguy May 09 '19 at 00:11
  • `static_cast(reinterpret_cast(c)` you might as well cast to `int` directly... `uintptr_t` is for portability in arch where pointers are larger than `int`: if `int` is too small than you loses information (implementation-defined) anyway. – curiousguy May 09 '19 at 00:16
  • 1
    @curiousguy I believed `static_cast` makes thing obvious, but I will remove it. I will edit the code about `F` virtual derived from `C`. I am not accustomed to the virtual inheritance syntax, sorry. About `uintptr_t`, if I reinterpret cast to int directly, I will get `error: cast from 'C*' to 'int' loses precision` (coliru [MCVE](http://coliru.stacked-crooked.com/a/ab21baeadf0fd859)). Thank a lot for reviewing my question. – javaLover May 09 '19 at 01:35
  • @javaLover 1) `static_cast` of ptr to a ptr makes it obvious a ptr conversion that involves a hierarchy movement occurs, but not its direction (up or down) or whether its up to virtual or non virtual. 2) A cast that loses precision is not "ill formed", the code should be translated (allowed to compile) with a warning unless you are in "all warnings are error" mode. Either way, you are only suppressing the warning not the underlying issue with two casts in a row. Hiding underlying issues with casts is a bad idea. – curiousguy May 09 '19 at 02:09
  • How is it possible to write code whose correctness depends (non-gratuitously) on such a constancy of offset? – Davis Herring May 09 '19 at 03:05
  • @curiousguy because they basically ask same thing. know the layout then offset is determined. I don't know how memory layout can be answered without language specification, it just lack a language-lawyer tag I think. – apple apple May 09 '19 at 03:26
  • @Davis Herring It is my ancient p̶r̶e̶m̶a̶t̶u̶r̶e̶l̶y̶ optimization since the early day I learnt C++. I heard that `dynamic_cast` is slow, so I cached the offset. Now, I accept that it is a bad practice, but my ancient works very good and fast so far. – javaLover May 09 '19 at 03:51
  • 1
    @appleapple One Q is about what is generally done, this Q is about what variation is permitted by the std. To know what a particular impl does, just check the documented ABI. My answer doesn't mention the ABI as ABIs are inherently impl dependent. – curiousguy May 09 '19 at 03:53
  • 1
    @DavisHerring Imagine you want to downcast from a non polymorphic virtual base. You would need to find the offset (which in practice is a constant in every impl). Or if you wanted to do conversions before the lifetime has started. (GCC sometimes generates crashing code when converting ptr to (= nearly empty, polymorphic w/o datamembers) virtual bases (that happen to be primary base) from inside ctors (where ction vtable are used).) – curiousguy May 09 '19 at 03:56
  • 1
    @curiousguy: The non-polymorphic example is insightful: in that case, a `static_cast` should be (but isn’t) possible iff the target type is `final`. – Davis Herring May 09 '19 at 04:03
  • @javaLover: Your ancient optimization at best has unspecified behavior (from converting to a pointer an integer that is not the result of converting a pointer). You should be able to use multiple `static_cast`s to get the effect you want in most cases (at least if you don’t make *every* inheritance `virtual`). – Davis Herring May 09 '19 at 04:06
  • 1
    @Davis Herring "from converting to a pointer an integer that is not the result of converting a pointer ==> unspecified behavior" It is such a strong statement. If you have time, may you give some reference, please? My peace depends on that feature. – javaLover May 09 '19 at 04:21
  • 1
    @DavisHerring "_from converting to a pointer an integer that is not the result of converting a pointer_" That's an extreme and unwarranted interpretation IMO. You should be able to do such casts iff the number would have been the result of the reserve conversion. – curiousguy May 09 '19 at 04:25
  • 1
    @javaLover In case of doubt, you can always throw in some `volatile` (which disable info propagation on the variable value) or `std::launder` – curiousguy May 09 '19 at 04:28
  • 1
    @javaLover You can be interested in the [p0137r1](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0137r1.html). – curiousguy May 09 '19 at 04:31
  • @curiousguy: It’s actually [implementation-defined](http://eel.is/c++draft/expr.reinterpret.cast#5), but an implementation may choose for it to be *undefined* behavior (at least to read or write through the result). It is a very real topic of debate among WG21 (C++) and WG14 (C) as to exactly which integers may be profitably converted this way, and some [real implementations do optimizations](https://wandbox.org/permlink/cFzRZrjD4GaBYYUJ) under the assumption that it is in general impossible to “guess” a pointer value. – Davis Herring May 09 '19 at 05:45
  • @DavisHerring At one extreme, postulating that the address of a complete object cannot be "guessed" (by peeping with gdb) and can only be derived from its syntactically taken address is certainly a reasonable assumption. At the other extreme, assuming that you can't derive a ptr to an array element from a ptr to another one w/o ptr arithmetic is not. The code you link to is extremely distasteful, but a dynamic linker may do something like that. – curiousguy May 09 '19 at 06:19

1 Answers1

1

[Foreword: Terminology:

For simplicity, we roughly follow the Itanium C++ ABI and relax/generalize terminology:

A proper base B of D is a real base class, distinct from D. An improper base is a proper base or D itself.

A proper subobject of a class D is a member or proper base; an improper subject is a proper subobject or D itself.

--end foreword]

[Foreword: Mistaken assumption:

You seem to be under the impression that expressing a type conversion as static_cast somehow guarantees that no complex code will be generated. This isn't the case, a static_cast can call anything a direct initialisation can, including calling a constructor: static_cast<String>("")

--end foreword]

There is no plausible reason for an implementation to not put every base class subobject at a fixed known offset in a complete (or most derived, which has the same layout in practice) object of type D; so the question is: is there nothing preventing a perverse implementation from doing that?

What would it take for an implementation to have distinct layouts and would such implementation be conforming? We need to list the pointer movements (implicit conversions or casts) supported inside a class hierarchy.

[Note that access of an inherited (non static) data member is defined by an (implicit) conversion of this followed by access of a non inherited data member and so is a call to an inherited non static function.]

For:

  • X an (improper) base subobject of D
  • Y a (proper) base subobject of X (doesn't really matter that Y is proper actually)
  • Z is another (proper) base of D

ASCII-art summary (tree may be collapsed):

Y
|
X     Z
 \   /
   D

These bases must be unambiguous: the base subobject Y must be the one and only of that type in X. (But an indirect base Y of D doesn't have to unambiguous: D::Y may not designate a single base, only D::X::Y must be unambiguous.)

Three kinds of "simple" hierarchy movements must be supported:

  • (up) conversion (that can be done implicitly) of X* to Y*
  • (down NV) for a non virtual base Y of X: down cast of Y* to X* can be performed by static_cast
  • (down P) for (possibly virtual) polymorphic base Y of X: down cast of Y* to X* can be performed by dynamic_cast

Other more complex movement is a dynamic_cast for Y* to Z*; it's two movements: down cast followed by up cast, going through the most derived object D, which doesn't have to be a statically known type. (Code performing these two steps explicitly would have to be able to name D.)

In general these operation are performed on at least partially constructed objects.

(The C++ standard has not made a clear decision on whether conversion to a pointer to a non virtual base is supported on a pointer to an un-constructed object. Anything involving a virtual base clearly cannot be done on an un-constructed object.)

So in practice the (improper) subobject X must carry enough information to locate its virtual bases, usually either by explicitly putting their addresses in hidden members, or by storing their offsets in the vtable.

[It implies that during construction of a proper base class with virtual bases, the vtable cannot be the same in general as the vtable for a complete object. This is unlike the construction of base subobjects (whether virtual or not) with only non virtual bases.]

If the virtual bases are located through the vtable, it means that there are no more possible class layouts then there are distinct vtables. The constructor of the most derived class would not be able to randomize the layout (unless vtables are emitted on the spot to the describe said layout).

If the virtual bases are located though hidden data members, it would seem that more flexibility for a perverse implementation exists. This is compounded by the fact that down casts from polymorphic virtual base must be supported: a polymorphic base only knows its dynamic type (through the vptr in all existing implementations). A derived class could store a array of addresses (or offset) (some, any) of its base classes, but a base cannot store information about each of its derived classes, by construction, as its layout must be defined before knowing which classes will be derived from it (it's easy to see that sizeof(T) cannot be, even in the most perverse implementation, a monotonic function of class definitions not used in T and that use T).

But a perverse implementation could still support multiple layouts, by either of these approaches:

(multiple-vtables)

If vtables are generated on the spot, or if many vtables are created a priori to allow for different layouts of a class that has virtual bases, then a polymorphic base could have access to enough information to do any down cast.

[Note that there is already not a one to one mapping between uniques types and vtables in general, so the equality test of typeid, even of expressions of the same type, cannot be an address comparison between vptr in general. In general type equality is implemented by comparing typeinfo pointers.]

[Note that if you are using a "DLL" and the dynamic linker, various "equivalent" (identical in their symbolic definition before linking) type information tables (vtables and typeinfo structures) could exist at different addresses, but these non fusing linking break the ODR anyway as the static objects wouldn't be fused either.]

(BLIP)

Every polymorphic potential base class (that can possibly be used as a base class, so final or local classes could be exempt) would have at least one additional hidden member: a pointer (alternatively a relative offset) to a member in the most derived class: the (base-table), an hidden table non static member.

The (base-table) would list (some, any) base class adresses (or offset), the base-locators (those bases the perverse implementations wants to reorder).

The BLIP (base-locators-info-ptr): a pointer to a typeinfo-like structure that contains the description of the base-locators layout, as the layout depends on the type of the most derived class which isn't known at compile time; note that the BLIP could be stored inside the vtable as its type dependent not instance dependent.

The down casts would locate the (base-table), which is opaque and impossible to interpret by code that only knows about the base class, and then use the BLIP to decode it, like the typeinfo data contains code to navigate in the base classes to implement down or up dynamic dynamic_cast.

This would seem exceptionally complicated and difficult to get right, and for what practical purpose?

curiousguy
  • 8,038
  • 2
  • 40
  • 58