0

I think I might be confusing myself. I know class with virtual functions in C++ has a vtable (one vtable per class type), so the vtable of Base class will have one element &Base::print(), while the vtable of Child class will have one element &Child::print().

When I declare my two class objects, base and child, base's vtable_ptr will pointer to Base class's vtable, while child's vtable_ptr will point to Child class's vtable. After I assign the address of base and child to an array of Base type pointer. I call base_array[0]->print() and base_array[1]->print(). My question is, both base_array[0] and base_array[1] is of type Base*, during run-time, although the v-table lookup will gives the correct function pointer, how could a Base* type see the element in Child class? (basically value 2?). When I call base_array[1]->print(), base_array[1] is of type Base*, but during run time it finds out it will use Child class print(). However, I am confused why value2 can be accessed during this time, because I am playing with type Base*..... I think I must miss something somewhere.

#include "iostream"
#include <string>
using namespace std;

class Base {
public:
    int value;
    string name;
    Base(int _value, string _name) : value(_value),name(_name) {
    }

    virtual void print() {
        cout << "name is " << name << " value is " << value << endl;
    }
};

class Child : public Base{
public:
    int value2;
    Child(int _value, string _name, int _value2): Base(_value,_name), value2(_value2) {
    }

    virtual void print() {
        cout << "name is " << name << " value is " << value << " value2 is " << value2 << endl;
    }
};

int main()
{
    Base base = Base(10,"base");
    Child child = Child(11,"child",22);

    Base* base_array[2];
    base_array[0] = &base;
    base_array[1] = &child;

    base_array[0]->print();
    base_array[1]->print();

    return 0;
}
curiousguy
  • 8,038
  • 2
  • 40
  • 58
  • (not all clases have virtual tables, only those with at least one virtual function or destructor.) – alfC May 26 '18 at 06:32
  • 1
    The constructor for the most derived class sets the vtable pointer for the object, so calls to virtual functions through the base pointer go through the vtable for the most derived class. – Pete Becker May 26 '18 at 06:49
  • @alfC Many compilers also use vtables for classes with virtual base classes – curiousguy May 26 '18 at 08:58
  • You learn C++. Here is a quiz: what is the parse tree of `Base* base_array[2];`? See what I mean? – curiousguy May 26 '18 at 09:05
  • @curiousguy Sorry my understanding is when I assign base_array[1] = &child, there should be a implicit typecasting here? Will the compiler save this info? – Kaiyu Shen May 26 '18 at 17:05
  • @KaiyuShen 1) Call me pedantic, but "implicit typecast" is _not_ a thing, as a cast is a special syntax (C style cast, functional cast or `xxx_cast`) to do a conversion. Implicit conversions are a thing. 2) `base_array[1] = &child` does an derived-to-base implicit pointer conversion. – curiousguy May 26 '18 at 18:26

3 Answers3

2

The call to print through the pointer does a vtable lookup to determine what actual function to call.

The function knows the actual type of the 'this' argument.

The compiler will also insert code to adjust to the actual type of the argument (say you have class child:

public base1, public base2 { void print(); };

where print is a virtual member inherited from base2. In that case the relevant vtable will not be at offset 0 in child so an adjustment will be needed to translate from the stored pointer value to the correct object location).

The data needed for that fix-up is generally stored as part of hidden run-time type information (RTTI) blocks.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
SoronelHaetir
  • 14,104
  • 1
  • 12
  • 23
  • Thanks for the clarification. So we will do some type casting when we call the corresponding virtual function's function pointer? (i.e in my case convert base* into child*) and do something like child*->(child::funcptr)() instead of calling base*->(child::funcptr)() ? – Kaiyu Shen May 26 '18 at 06:57
  • @Kaiyu Don't even think about it! Maybe my answer will help clarify things a little. – Paul Sanders Jun 13 '18 at 06:28
0

I think I must miss something somewhere

Yes, and you got most of the stuff right until the end.

Here is a reminder of the really basic stuff in C/C++ (C and C++: same conceptual heritage so a lot of basic concepts are shared, even if the fine details diverge significantly at some point). (That may be really obvious and simple, but it's worth saying it loudly to feel it.)

Expressions are part of the compiled program, they exist at compile time; objects exist at run time. An object (the thing) is designated by expression (the word); they are conceptually different.

In traditional C/C++, an lvalue (short for left-value) is an expression whose runtime evaluation designates an object; dereferencing a pointer gives an lvalue, (f.ex. *this). It's called "left value" because the assignment operator on the left requires an object to assign to. (But not all lvalue can be at the left of the assignment operator: expressions designating const objects are lvalues, usually cannot be assigned to.) Lvalues always have a well defined identity and most of them has an address (only members of struct declared as bit-field cannot have their address taken, but the underlying storage object still has an address).

(In modern C++, the lvalue concept was renamed glvalue, and a new concept of lvalue was invented (instead of making a new term for the new concept and keeping the old term of concept of object with an identity that may or may not be modifiable. That was in my not so humble opinion a serious error.)

The behavior of a polymorphic object (an object of a class type with at least one virtual function) depends of its dynamic type, that its type of started construction (the name of the constructor of the object that started to construct data members, or entered the constructor body). During the execution of the body of the Child constructor, the dynamic type of the object designed by *this is Child (during the execution of the body of a base class constructor, the dynamic type is that of the base class constructor running).

Dynamic polymorphic means that you can use a polymorphic object with an lvalue whose declared type (type deduced at compile time from the rules of the language) isn't exactly the same type, but a related type (related by inheritance). That's the whole point of the virtual keyword in C++, without that it would be completely useless!

If base_array[i] contains an address of an object (so its value is well defined, not null), you can dereference it. That gives you an lvalue whose declared type is always Base * by definition: that's the declared type, the declaration of base_array being:

Base (*(base_array[2])); // extra, redundant parentheses 

which can of course be written

Base* base_array[2];

if you want to write it that way, but the parse tree, the way the declaration is decomposed by the compiler is NOT

{ Base* } { base_array[2] }

(using bold face curly brace to symbolically represent parsing)

but instead

Base { * { { base_array } [2] } }

I hope you understand that the curly braces here are my choice of meta language NOT the curly braces used in the language grammar to define classes and functions (I don't know how to draw boxes around text here).

As a beginner, it's important that you "program" your intuition correctly, to always read declarations like the compiler does; if you ever declare two identifier on the same declaration, the difference is important int * a, b; means int (*a), b; AND NOT int (*a), (*b);

(Note: even if that might be clear to you the OP, as this is clearly a question of interest to beginners in C++, that reminder of C/C++ declaration syntax might be of use of someone else.)

So, going back to the issue of polymorphism: an object of a derived type (name of the most recently entered constructor) can be designated by an lvalue of a base class declared type. The behavior of virtual function calls is determined by the dynamic type (also called real type) of the object designated by the expression, unlike the behavior of non virtual function calls; that's the semantic defined by the C++ standard.

The way the compiler gets the semantic defined by a language standard is its own problem and not described in the language standard, but when there is only one efficient simple to do it, all compilers do it essentially the same way (the fine details are compiler-specific) with

  • one virtual function table ("vtable") per polymorphic class
  • one pointer to a vtable ("vptr") per polymorphic object

(Both vtable and vptr are obviously implementation concepts and not language concepts, but they are so common that every C++ programmer knows them.)

The vtable is a description of the polymorphic aspects of the class: the runtime operations on an expression of a given declared type whose behavior depends on the dynamic type. There is one entry for each runtime operation. A vtable is like a struct (record) with one member (entry) per operation (all entries are usually pointers of the same size, so many people describe the vtable as an array of pointers, but I don't, I describe it as a struct).

The vptr is a hidden data member (a data member without a name, not accessible by C++ code), whose position in an object is fixed like any other data member, that can be read by the runtime code when an lvalue of polymorphic class type (call it D for "declared type") is evaluated. Dereferencing a vptr in D gives you a vtable describing a D lvalue, with entries for each runtime aspect of an lvalue of type D. By definition, the location of the vptr and the interpretation of the vtable (layout and use of its entries) are completely determined by the declared type D. (Obviously no information necessary for the use and interpretation of the vptr can be a function of the runtime type of an object: the vptr is used when that type is not known.)

The semantics of the vptr is the set of guaranteed valid runtime operations on the vptr: how the vptr can be dereferenced (the vptr of an existing object always points to a valid vtable). It's the set of properties of the form: by adding offset off to vptr value, you get a value that can be used in "such way". These guarantees form a runtime contract.

The most obvious runtime aspect of a polymorphic object is calling virtual function, so there is an entry in a vtable for D lvalue for each virtual function that can be called on an lvalue of type D, that is an entry for each virtual function declared either in that class or in a base class (not counting overriders as they are the same). All non static member functions have a "hidden" or "implicit" argument, the this parameter; when compiling it becomes a normal pointer.

Any class X derived from D will have a vtable for D lvalues. For efficiency in common case of simple case of normal (non virtual) single inheritance, the semantics of the vptr of the base class (that we then call a primary base class) will be augmented with new properties, so the vtable for X will be augmented: the layout and semantics of the vtable for D will be augmented: any property of a vtable for D is also a property of the vtable for X, the semantic will be "inherited": there's an "inheritance" of the vtables in parallel with the inheritance inside the classes.

In logical terms, there's an increase of guarantees: the guarantees of the vptr of a derived class object are stronger than the guarantees of the vptr of a base class object. Because it's stronger contract, all code generated for a base lvalue is still valid.

[In more complex inheritance, that is either virtual inheritance, or non virtual secondary inheritance (in multiple inheritance, inheritance from a secondary base, that is any base that is not defined as "primary base"), the augmentation of the semantics of the vtable of the base class is not so simple.]

[One way to explain C++ classes implementation is as a translation to C (indeed the first C++ compiler was compiling to C, not to assembly). The translation of a C++ member function is simply a C function where the implicit this parameter is explicit, a normal pointer parameter.]

The vtable entry for a virtual function for D lvalue is just a pointer to a function with as parameter the now explicit this parameter: that parameter is a pointer to D, it actually points to the D base subobject of an object of class derived from D, or an object of actual dynamic type D.

If D is a primary base of X, that is one that starts at the same address as the derived class, and where the vtable starts at the same address, so the vptr value is the same, and the vptr is shared between a primary base and the derived class. It means that virtual calls (calls on lvalue that go through vtable) to virtual functions in X that replace identically (that override with the same return type) just follow the same protocol.

(Virtual overriders can have a different, covariant return type and a different call convention might be used in that case.)

There are other special vtable entries:

  • Multiple virtual call entries for a given virtual function signature if the overrider has a covariant return type that requires and adjustment (that is not a primary base).
  • For special virtual functions: when the delete operator used on a polymorphic base with a virtual destructor, it is done through a deleting virtual destructor, to call the correct operator delete (replaced delete if there is one).
  • There is also a non deleting virtual destructor that is used for explicit destructor calls: l.~D();
  • The vtables stores the offsets for each virtual base subobjects, for implicit conversion to a virtual base pointer, or for accessing its data members.
  • There is the offset of the most derived object for dynamic_cast<void*>.
  • An entry for the typeid operator applied to a polymorphic object (notably, the name() of the class).
  • Enough information for dynamic_cast<X*> operators applied to a pointer to polymorphic object to navigate the class hierarchy at runtime, to locate a given base class or derived subobject (unless X isn't simply a base class of the cast type as that is without dynamically navigating the hierarchy).

This is just an overview of the information present in vtable and the kinds of vtable, there are other subtleties. (Virtual bases are notably more complex than non virtual bases at the implementation level.)

curiousguy
  • 8,038
  • 2
  • 40
  • 58
0

I think you might be confusing the way a pointer is declared with the type of the object it happens to be pointing to.

Forget about vtables for a moment. They're an implementation detail. They are just a means to an end. Let's look at what your code is actually doing.

So, with reference to the code you posted, this line:

base_array[0]->print();

calls into Bases implementation of print(), because the object pointed to is of type Base.

Whereas this line:

base_array[1]->print();

calls into Childs implementation of print(), because (yes, you guessed it) the object pointed to is of type Child. You don't need any fancy type casts to make this happen. It will just happen anyway, provided the method is declared virtual.

Now, inside the body of Base::print(), the compiler doesn't know (or care) whether this points to an object of type Base or an object of type Child (or any other class derived from Base, in the general case). It therefore follows that it can only access data members declared by Base (or any parent classes of Base, if there were any). Once you understand that, it's all simple enough.

But inside the body of Child::print(), the compiler does know a bit more about what this is pointing to - it has to be an instance of class Child (or some other class derived from Child). So now, the compiler can safely access value2 - inside the body of Child::print() - and your example therefore compiles correctly.

I think that's about it really. The vtable is only there to dispatch to the correct virtual method when you call that method through a pointer whose type is not known at compile time, as your example code is indeed doing. (*)


(*) Well, almost. Optimising compilers are getting pretty funky these days, there is actually enough information there for it you call the relevant method direct but please don't let that confuse the issue in any way.

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • Optimization might make it harder to understand compiled code. If this is the case, add indirections all over the place and a lot of `volatile`. That will teach this optimizer! – curiousguy Jun 14 '18 at 06:22