19

enter image description here

CAT *p;
...
p->speak();
...

Some book said that the compiler will translate p->speak() to:

(*p->vptr[i])(p); //i is the idx of speak in the vtbl

My question is: since at compile time, it is impossible to know the real type of p, which means it is impossible to know which vptr or vtbl to be use. So, how does the compiler generate correct code?

[modified]

For example:

void foo(CAT* c)
{
    c->speak();
    //if c point to SmallCat
    // should translate to (*c->vptr[i])(p); //use vtbl at 0x1234   
    //if c point to CAT
    // should translate to (*c->vptr[i])(p); //use vtbl at 0x5678  

    //since ps,pc all are CAT*, why does compiler can generate different code for them 
    //in compiler time?
}

...
CAT *ps,*pc;
ps = new SmallCat;  //suppose SmallCat's vtbl address is 0x1234;
pc = new CAT;       //suppose CAT's vtbl address is 0x5678;
...
foo(ps);
foo(pc)
...

Any ideas? Thanks.

camino
  • 10,085
  • 20
  • 64
  • 115

4 Answers4

20

What your picture is missing is an arrow from a CAT and a SmallCAT objects to their corresponding vtbls. The compiler embeds a pointer to vtbl into the object itself - one can think of it as a hidden member variable. That is why it is said that adding the first virtual function "costs" you one pointer per object in memory footprint. The pointer to vtbl is set up by the code in the constructor, so all the compiler-generated virtual call needs to do in order to get to its vtable at runtime is dereferencing the pointer to this.

Of course this gets more complicated with virtual and multiple inheritance: the compiler needs to generate a slightly different code, but the basic process remains the same.

Here is your example explained in more details:

CAT *p1,*p2;
p1 = new SmallCat;  //suppose its vtbl address is 0x1234;
// The layout of SmallCat object includes a vptr as a hidden member.
// At this point, the value of this vptr is set to 0x1234.
p2 = new CAT;       //suppose its vtbl address is 0x5678;
// The layout of Cat object also includes a vptr as a hidden member.
// At this point, the value of this vptr is set to 0x5678.
(*p1->vptr[i])(p); //should use vtbl at 0x1234
// Compiler has enough information to do that, because it squirreled away 0x1234
// inside the SmallCat object at the time it was constructed.
(*p2->vptr[i])(p); //should use vtbl at 0x5678
// Same deal - the constructor saved 0x5678 inside the Cat, so we're good.
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • if we don't know the type of p, how could we know we should use the vtbl of CAT or the vtbl of SmallCAT. Because in compile time we don't have the real object, so we can't use its vptr to get corresponding vtbl right? – camino Feb 04 '14 at 21:01
  • @camino The compiler does not know what table he's getting, but both tables are organized in such order that when you pick, say, the second record, you get the `eat` function, no matter what subclass of `Cat` it is. – Sergey Kalinichenko Feb 04 '14 at 21:03
  • 1
    I have added an example, hope it make my question clear – camino Feb 04 '14 at 21:14
  • @camino When the object is created (whenever and wherever that is), apart from what you've defined in its class and what it inherits, a hidden member (that we call a vptr) is added to the object, its value being a pointer to the vtbl of its actual class (which is known, because objects are created using explicitly stated constructors). Therefore, it doesn't matter what variable or type of expression you later use to store the reference (to that object): the vptr of that object will always point to the vtbl of the correct class, as it has done since that object's creation. – Theodoros Chatzigiannakis Feb 04 '14 at 21:16
  • Thank you.I move "pc->speak()" to a function, so in this case, how compile handle it ? – camino Feb 04 '14 at 21:47
  • 3
    @camino The pointer to vtbl is embedded with objects that subclass a particular class always at the same place. When you pass a `Cat` to `foo`, `c->vptr` evaluates to `0x5678`; when you pass `SmallCat`, `c->vptr` evaluates to `0x1234`. The call of `speak` dereferences it, grabs the pointer at the first location, and calls the function pointed to by the pointer inside `vtbl[i]`. – Sergey Kalinichenko Feb 04 '14 at 21:54
  • The position of vptr are location in the same offset of CAT,SmallCat, right? – camino Feb 04 '14 at 22:07
  • 2
    @camino Correct, the compiler ensures that they are. – Sergey Kalinichenko Feb 04 '14 at 22:11
  • 1
    @camino: In fact, the `vptr` is almost always the first member – Mooing Duck Feb 04 '14 at 22:46
9

which means it is impossible to know which vptr or vtbl to be use

That's correct during method invocation. But at construction time, the type of the constructed object is actually known, and the compiler will generate code in the ctor to initialize the vptr to point to the vtbl of the corresponding class. All the later virtual method invocations will call the method in the right vtbl via this vptr.

For more details on how exactly this initialization works with base objects (with multiple ctors being called in sequence), please refer to this answer to a similar question.

Community
  • 1
  • 1
user3146587
  • 4,250
  • 1
  • 16
  • 25
  • that's also the reason why you shouldn't call virtual functions in the constructor. – Alexander Oh Feb 04 '14 at 20:59
  • 2
    @Alex: There's no reason to shun it in C++: the behavior is safe (as opposed to Java, where you might touch uninitialized members, and where it's therefore discouraged). People who discourage it in C++ are usually confusing 2 languages, which is a problem with those persons not the languages. – MSalters Feb 04 '14 at 21:23
  • 2
    @MSalters: People discourage it in C++ not because it's unsafe, but because it's unintuitive, and therefore error prone. – Mooing Duck Feb 04 '14 at 21:26
  • @MooingDuck: I wonder where that "intuition" comes from. If from another language, see previous comment. But in general, when I'm building a Derived object and I've only finished the Base object, _my_ intuition says it ought to behave like a Base object. – MSalters Feb 04 '14 at 21:30
  • 1
    @MSalters I'd say people get confused if they don't know how C++ builds it's objects. theoretically it might also be the other way round I'm building the derived object and once I've set up the vptr I'll go to the base classes and initialize them. In general this behaviour might turn out suprising. If you know what the compiler is doing, it's perfectly valid to do so. It's not about causing undefined behaviour, but confusing other people who might read it. – Alexander Oh Feb 04 '14 at 21:39
  • 2
    @Alex: You're supposed to know the order of construction anyway, not just because of virtual functions but because all members are created in base-to-derived order. And C++ goes to some lengths to help you: `this->` allows access only to members already constructed. – MSalters Feb 04 '14 at 21:42
  • @MSalters true. People writing code in C++ should know alot about the language. I like to see people know about SFINAE, metaprogramming, overload resolution, CRTP etc. unfortunately not all of them do. (And yes I agree one should expect a certain knowlege of C++ if the person claims to know C++) PS: Oh I didn't know that `this->` part! I got to check whether compilers actually do check that or ignore it. – Alexander Oh Feb 04 '14 at 21:50
  • @Alex: Except for library writes, most C++ programmers only need to understand overload resolution from that list. The course of learning really should be * absolute basics * * steps of compilation * headers * basic types including `std::string` * basic library algorithms * OO * RAII (exceptions) * Overloading * Containers & Iterators * `switch` * ... – MSalters Feb 04 '14 at 22:01
6

The compiler implicitly adds a pointer called vptr to every class that has one or more virtual functions.

You can tell this by using sizeof on such class, and see that it is larger than what you'd expect by 4 or 8 bytes, depending on the sizeof(void*).

The compiler also adds to the constructor of each class, an implicit piece of code which sets vptr to point to a table of function pointers (a.k.a. V-Table).

When an object is instantiated, its type is explicitly "mentioned".

For example: A a(1) or A* p = new B(2).

So inside the constructor, during runtime, vptr can be easily set to point to the correct V-Table.

In the example above:

  • The vptr of a is set to point to the V-Table of class A.

  • The vptr of p is set to point to the V-Table of class B.

BTW, the constructor is different from all other functions, in the fact that you have to explicitly use the object type in order to call it (hence a constructor can never be declared virtual).

Here is how the compiler generates the correct code for a virtual function p->speak():

CAT *p;
...
p = new SuperCat("SaberTooth",2); // p->vptr = SuperCat_Vtable
...
p->speak(); // See pseudo assembly code below

Ax = p               // Get the address of the instance
Bx = p->vptr         // Get the address of the instance's V-Table
Cx = Bx + CAT::speak // Add the number of the function in its class
Dx = *Cx             // Get the address of the appropriate function
Push Ax              // Push the address of the instance into the stack
Push Dx              // Push the address of the function into the stack
CallF                // Save some registers and jump to the beginning of the function

The compiler uses the same number (index) for all speak functions in the hierarchy of class CAT.

Here is how the compiler generates the correct code for a non-virtual function p->eat():

p->eat(); // See pseudo assembly code below

Ax = p        // Get the address of the instance
Bx = CAT::eat // Get the address of the function
Push Ax       // Push the address of the instance into the stack
Push Bx       // Push the address of the function into the stack
CallF         // Save some registers and jump to the beginning of the function

Since the address of the eat function is known at compile-time, the assembly code is more efficient.

And finally, here is how 'vptr' is set to point to the correct V-Table during runtime:

class SmallCat
{
    void* vptr; // implicitly added by the compiler
    ...         // your explicit variables
    SmallCat()
    {
        vptr = (void*)0x1234; // implicitly added by the compiler
        ...                   // Your explicit code
    }
};

When you instantiate CAT* p = new SmallCat(), a new object is created, with its vptr = 0x1234

barak manos
  • 29,648
  • 10
  • 62
  • 114
  • In compile time we don't have the real object, so we can't use its vptr to get corresponding vtbl right? – camino Feb 04 '14 at 21:03
  • 1
    No, but as I said, we can add code that **during runtime** will set `vptr` to point to the right V-Table. – barak manos Feb 04 '14 at 21:04
  • I have added an example, hope it make my question clear – camino Feb 04 '14 at 21:14
  • Added a "pseudo-compilation" of your example; please see revised answer. – barak manos Feb 04 '14 at 21:24
  • Thank you.I move "pc->speak()" to a function, so in this case, how does compiler handle it ? – camino Feb 04 '14 at 21:48
  • The function `foo` is called as any other non-virtual function: The address of the function is known at compile-time, so the compiler simply replaces any call to `foo` with pushing arguments into the stack and jumping to that address. The code for `c->speak()` is replaced in a similar way to the example I gave you in the answer. – barak manos Feb 04 '14 at 21:53
  • I mean for SmallCAT *p = new SmallCAT; we know that p->ptr is 0x1234, but inside function foo, we have no idea what c is point to, so we don't know what c->ptr is, right? – camino Feb 04 '14 at 22:00
  • 1
    Of course you do! `c` is pointing to an object of type `CAT` or an object of type `SmallCat`. When you create that object, its `vptr` field was set to point to the correct VTable. Please see updated answer. – barak manos Feb 04 '14 at 22:06
  • @camino: When you have a `CAT* p`, you can access all member objects (and functions) for that instance of a `CAT`, including the `vptr`. The `vptr` tells it which "real" functions to call. You can access these, because the compiler knows that `p` points to _some_ kind of `CAT`, even if it isn't positive which one, and _all_ types of `CAT` contain the `CAT` members (including the `vptr`) – Mooing Duck Feb 04 '14 at 22:43
4

When you write this (I've replaced all usercode with lowercase):

class cat {
public:
    virtual void speak() {std::cout << "meow\n";}
    virtual void eat() {std::cout << "eat\n";}
    virtual void destructor() {std::cout << "destructor\n";}
};

The compiler generates all of this magically (All my sample compiler code is uppercase):

class cat;
struct CAT_VTABLE_TYPE { //here's the cat's vtable type
    void(*speak)(cat* this); //contains a pointer for each virtual function
    void(*eat)(cat* this);
    void(*destructor)(cat* this);
};
extern CAT_VTABLE_TYPE CAT_VTABLE; //later is a global shared copy of the vtable
class cat { //here's the class you typed
private:
    CAT_VTABLE_TYPE* vptr; //but the compiler adds this magic member
public:
    cat() :vptr(&CAT_VTABLE) {} //the compiler initializes the vtable ptr
    ~cat() {vptr->destructor(this);} //redirects to the one you coded
    void speak() {vptr->speak(this);} //redirects to the one you coded
    void eat() {vptr->eat(this);} //redirects to the one you coded
};

//Here's the functions you programmed
void DEFAULT_CAT_SPEAK(CAT* this) {std::cout << "meow\n";}
void DEFAULT_CAT_EAT(CAT* this) {std::cout << "eat\n";}
void DEFAULT_CAT_DESTRUCTOR(CAT* this) {std::cout << "destructor\n";}
//and the global cat vtable (shared by all cat objects)
const CAT_VTABLE_TYPE CAT_VTABLE = {
    DEFAULT_CAT_SPEAK, 
    DEFAULT_CAT_EAT, 
    DEFAULT_CAT_DESTRUCTOR};

Well, that's a lot isn't it? (I actually cheated slightly, since I take the address of an object before it's defined, but this way is less code and less confusing, even if technically uncompilable) You can see why they built it into the language. And... here's SmallCat before:

class smallcat : public cat {
public:
    virtual void speak() {std::cout << "meow2\n";}
    virtual void destructor() {std::cout << "destructor2\n";}
};

and after:

class smallcat;
//here's the smallcat's vtable type
struct SMALLCAT_VTABLE_TYPE : public CAT_VTABLE_TYPE { 
     //contains no additional virtual functions that cat didn't have
};
extern SMALLCAT_VTABLE_TYPE SMALLCAT_VTABLE; //later is a global shared copy of the vtable
class smallcat : public cat { //here's the class you typed
public:
    smallcat() :vptr(&SMALLCAT_VTABLE) {} //the compiler initializes the vtable ptr
    //The other functions already are virtual, nothing additional needed
};
//Here's the functions you programmed
void DEFAULT_SMALLCAT_SPEAK(CAT* this) {std::cout << "meow2\n";}
void DEFAULT_SMALLCAT_DESTRUCTOR(CAT* this) {std::cout << "destructor2\n";}
//and the global cat vtable (shared by all cat objects)
const SMALLCAT_VTABLE_TYPE SMALLCAT_VTABLE = {
    DEFAULT_SMALLCAT_SPEAK, 
    DEFAULT_CAT_EAT, //note: eat wasn't overridden
    DEFAULT_SMALLCAT_DESTRUCTOR};

So, if that's too much to read, the compiler makes a VTABLE object for each type, which points to the member functions for that particular type, and then it sticks a pointer to that VTABLE inside each instance.

When you create a smallcat object, the compiler constructs the cat parent object, which assigns the vptr to point at the CAT_VTABLE global. Immediately after, the compiler constructs the smallcat derived object, which overwrites the vptr member to make it point at the SMALLCAT_VTABLE global.

When you call c->speak();, the compiler produces calls it's copy of cat::speak, (which looks like this->vptr->speak(this);). The vptr member might be pointing at the global CAT_VTABLE or the global SMALLCAT_VTABLE, and that table's speak pointer is therefore pointing either at DEFAULT_CAT_SPEAK (what you put in cat::speak), or DEFAULT_SMALLCAT_SPEAK (the code you placed in smallcat::speak). So this->vptr->speak(this); ends up calling the function for the most derived type, no matter what the most derived type is.

All in all, it is admittedly very confusing, since the compiler is magically renaming functions at compile time. Actually, due to multiple inheritance, in reality it's far more confusing than I've shown here.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158