Why does a base pointer can access derived member variable in virtual funtion

Question

class Base {
public:
    virtual void test() {};
    virtual int get() {return 123;}
private:
    int bob = 0;
};

class Derived: public Base{
public:
    virtual void test() { alex++; }
    virtual int get() { return alex;}
private:
    int alex = 0;
};
Base* b = new Derived();
b->test();

When test and get are called, the implicit this pointer is passed in. Is it because Derived classes having a sub memory layout that is identical to what a pure base object would be, then this pointer works for both as a base pointer and derived pointer?

Another way to put it is, the memory layout for Derived is like

vptr <-- this
bob
alex

That is why it can use alex in b->test(), right?

It is unclear what you are asking here. A lot of this doesn't make sense. — bcr, Apr 26 '18 at 21:13
Your example is the definition of [Polymorphism](https://stackoverflow.com/questions/5854581/polymorphism-in-c). — scohe001, Apr 26 '18 at 21:15

Remy Lebeau · Answer 1 · 2018-04-26T23:13:18.303

Inside of Derived's methods, the implicit this pointer is always a Derived* pointer (more generically, the this pointer always matches the class type being called). That is why Derived::test() and Derived::get() can access the Derived::alex member. That has nothing to do with Base.

The memory layout of a Derived object begins with the data members of Base, followed by optional padding, followed by the data members of Derived. That allows you to use a Derived object wherever a Base object is expected. When you pass a Derived* pointer to a Base* pointer, or a Derived& reference to a Base& reference, the compiler will adjust the pointer/reference accordingly at compile-time to point at the Base portion of the Derived object.

When you call b->test() at runtime, where b is a Base* pointer, the compiler knows test() is virtual and will generate code that accesses the appropriate slot in b's vtable and call the method being pointed at. But, the compiler doesn't know what derived object type b is actually pointing at in runtime (that is the whole magic of polymorphism), so it can't automatically adjust the implicit this pointer to the correct derived pointer type at compile-time.

In the case where b is pointing at a Derived object, b's vtable is pointing at Derived's vtable. The compiler knows the exact offset of the start of Derived from the start of Base. So, the slot for test() in Derived's vtable will point to a private stub generated by the compiler to adjust the implicit Base *this pointer into a Derived *this pointer before then jumping into the actual implementation code for Derived::test().

Behind the scenes, it is roughly (not exactly) implemented like the following pseudo-code:

void Derived_test_stub(Base *this)
{
    Derived *adjusted_this = reinterpret_cast<Derived*>(reinterpret_cast<uintptr_t>(this) + offset_from_Base_to_Derived);
    Derived::test(adjusted_this);
}

int Derived_get_stub(Base *this)
{
    Derived *adjusted_this = reinterpret_cast<Derived*>(reinterpret_cast<uintptr_t>(this) + offset_from_Base_to_Derived);
    return Derived::get(adjusted_this);
}

struct vtable_Base
{
    void* funcs[2] = {&Base::test, &Base::get};
};

struct vtable_Derived
{
    void* funcs[2] = {&Derived_test_stub, &Derived_get_stub};
};

Base::Base()
{
    this->vtable = &vtable_Base;
    bob = 0;
}

Derived::Derived() : Base()
{
    Base::vtable = &vtable_Derived;
    this->vtable = &vtable_Derived;
    alex = 0;
}

...

Base *b = new Derived;

//b->test(); // calls Derived::test()...
typedef void (*test_type)(Base*);
static_cast<test_type>(b->vtable[0])(b); // calls Derived_test_stub()...

//int i = b->get(); // calls Derived::get()...
typedef int (*get_type)(Base*);
int i = static_cast<get_type>(b->vtable[1])(b); // calls Derived_get_stub()...

The actual details are a bit more involved, but that should give you a basic idea of how polymorphism is able to dispatch virtual methods at runtime.

Thank you for the answer. > The compiler knows the exact offset of the start of Derived from the start of Base I thought both Derived and Base started at the same address, so the offset would always be 0, right? — Jerry, Apr 28 '18 at 16:29
@Jerry no. The offset of `Derived` is the sum of all of `Base`'s data members and padding. A `Derived` object is also a `Base` object, so it has everything that `Base` defines plus everything `Derived` defines — Remy Lebeau, Apr 28 '18 at 19:25
But when a `Base` pointer points to a `Derived` object, it basically points to the start address of `Derived` Object or the `Base` part of that `Derived` object, right? — Jerry, Apr 29 '18 at 22:50
@Jerry a `Base*` pointer points to the `Base` portion of the `Derived` object. The `Derived` portion is offset from that. Accessing `Derived` members is relative to the `Derived` portion, not the `Base` portion, so a `Derived*` pointer is needed. During virtual method dispatching, that means adjusting a `Base*` pointer into a `Derived*` pointer — Remy Lebeau, Apr 30 '18 at 00:27
That's interesting! What about in a `Derived` member function, it accesses a public `Base` member variable. Since `this` pointer is a `Derived` class pointer, if it points to the start address of `Derived` portion, how does it point back to the `Base` portion? — Jerry, May 05 '18 at 13:50
I think I misunderstood your reply. Just did a quick test, I printed a `Base` pointer and the `this` pointer from a virtual function inside of `Derived`, they pointed to the same address — Jerry, May 05 '18 at 13:55
@Jerry since the compiler knows the offset of member variables inside of `Base`, and the offset between `Base` and `Derived`, then inside a `Derived` method, the compiler can perform a simple calculation to reach `Base` member variables from a `Derived*` pointer. — Remy Lebeau, May 05 '18 at 16:46

score 1 · Accepted Answer · answered Apr 26 '18 at 21:16

What you've shown is reasonably accurate, at least for a typical implementation. It's not guaranteed to be precisely as you've shown it (e.g., the compiler might easily insert some padding between bob and alex, but either way it "knows" that alex is at some predefined offset from this, so it can take a pointer to Base, calculate the correct offset from it, and use what's there.

Not what you asked about, so I won't try to get into detail, but just a fair warning: computing such offsets can/does get a bit more complex when/if multiple inheritance gets involved. Not so much for accessing a member of the most derived class, but if you access a member of a base class, it has to basically compute an offset to the beginning of that base class, then add an offset to get to the correct offset within that base class.

Thank you for the answer. Yeah, I'm not worried about multi-inheritance. AFAIK, the memory layout for multi-inheritance is compiler dependent. — Jerry, Apr 26 '18 at 21:39

score 0 · Answer 3 · answered Apr 26 '18 at 21:19

A derived class is not a seperate class but an extension. If something is allocated as derived then a pointer (which is just an address in memory) will be able to find everything from the derived class. Classes don't exist in assembly, the compiler keeps track of everything according to how it is allocated in memory and provides appropriate checking accordingly.

Why does a base pointer can access derived member variable in virtual funtion

3 Answers3