Why does C++ RTTI require a virtual method table?

Question

Why does C++ RTTI require the class to have a virtual methods table? While it seems reasonable to use the table as a means for polymorphic upcasting, it doesn't seem like it is strictly required from a design point of view. For instance, the class could contain a hash or a unique identifier that conveys the information.

For the C++ experts who consider this question overly trivial, it would help the poster of this question, who is a humble beginner at C++, to provide an explanation of why vtables are required from a design point of view for RTTI, as well as what are the other design approaches (instead of using vtables) to implement RTTI (and why they work/don't work as well as vtables).

If you had a hash or something like that, where would you put it? The vtable is the only class-invariant structure an object has available to it. — Mark Ransom, Jul 22 '20 at 02:03
@MarkRansom I'm not too familiar with C++ to be sure. But from a compiler point of view, it seems that it's also possible to keep for every class a variable with a fixed name (e.g. the required `char ID` field in LLVM's passes for them to be properly registered into the old pass manager). Or maybe reserve a certain preprocessor directive for RTTI-enabled classes. I'm not sure. — BearAqua, Jul 22 '20 at 02:06
Are you asking about replacing virtual tables or the virtual pointer in objects? The table is a bunch of pointers to virtual functions, so I don't see how a hash could possibly replace it. — happydave, Jul 22 '20 at 02:13
@happydave No; I'm not. Imagine using RTTI for classes that don't really need virtual methods. The common way to do so would be to implement e.g. a virtual destructor that really doesn't do anything. It seems like in this case, RTTI is only using the vtable pointers to keep track of inheritance information. Arguably a pointer is a bit too large for keeping track of this information only; a single byte seems to suffice? In other words, I don't see why RTTI needs the _entire_ virtual table's information. — BearAqua, Jul 22 '20 at 02:16
Related: [How are virtual functions and vtable implemented?](https://stackoverflow.com/q/99297/430766) — bitmask, Jul 22 '20 at 02:16
Also related: https://stackoverflow.com/questions/4352032/alternative-virtual-function-calls-implementations. A single byte is probably not enough, since it's hard to bound the number of types to 256. But it's not uncommon to implement a pseudo-RTTI this way in C++. E.g. https://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html — happydave, Jul 22 '20 at 02:21

score 2 · Accepted Answer · answered Jul 22 '20 at 02:44

From a language perspective, the answer is: it doesn't. Nowhere in the C++ standard does it say how virtual functions are to be implemented. The compiler is free to make sure the correct function is called however it sees fit.

So, what would be gained by replacing the vptr (not the vtable) with an id and dropping the vtable? (replacing the vtable with an id doesn't really help anything whatsoever, once you have resolved vptr, you already know the run-time type)
How does the runtime know which function to actually call?

Consider:

template <int I>
struct A {
  virtual void foo() {}
  virtual void bar() {}
  virtual ~A() {}
};

template <int I>
struct B : A<I> {
  virtual void foo() {}
};

Suppose your compiler gives A<0> the ... lets call it vid ... 0 and A<1> the vid 1. Note that A<0> and A<1> are completely unrelated classes at this point. What happens if you say a0.foo() where a0 is an A<0>? At runtime a non-virtual function would just result in a statically dispatched call. But for a virtual function, the address of the function-to-call must be determined at runtime.

If all you had was vid 0 you'd still have to encode which function you want. This would result in a forest of if-else branches, to figure out the correct function pointer.

if (vid == 0) {
  if (fid == 0) {
    call A<0>::foo();
  } else if (fid == 1) {
    call A<0>::bar();
  } /* ... */
} else if (vid == 1) {
  if (fid == 0) {
    call A<1>::foo();
  } else if (fid == 1) {
    call A<1>::bar();
  } /* ... */
} /* ... */

This would get out of hand. Hence, the table. Add an offset that identifies the foo() function to the base of A<0>'s vtable and you have the address of the actual function to call. If you have a B<0> object on your hands instead, add the offset to that class' table's base pointer.

In theory compilers could emit if-else code for this but it turns out a pointer addition is faster and the resulting code smaller.

score 0 · Answer 2 · answered Jul 22 '20 at 02:44

Vtables are a very efficient way of providing virtual functions. For the price of a single pointer per object, every member of the class can share the same static vtable.

Adding a second bunch of static information per class would require a second pointer per object. It's much easier to make the existing vtable pointer do double duty.

Daniel · Answer 3 · 2020-07-22T03:17:37.897

In the end it’s all down to history and trade offs.
On one side you need to be compatible with C, specifically standard layout types must have the same layout as in C, which means no place for RTTI.
On the other hand adding RTTI to a vtable will result in no size cost for the instance.
The designers of C++ decided to combine these two facts to the current implementation: only polymorphic types have dynamic RTTI information.

You can still obtain the static RTTI information and make your own layout for a non polymorphic type:

template<typename T>
struct S
{
    const std::type_info &type = typeid(T);
    T value;
};

You can even pass void pointers to value, they will have the same structure as T, and you know there is a type info pointer behind them.

Why does C++ RTTI require a virtual method table?

3 Answers3