What can make C++ RTTI undesirable to use?

Question

Looking at the LLVM documentation, they mention that they use "a custom form of RTTI", and this is the reason they have isa<>, cast<> and dyn_cast<> templated functions.

Usually, reading that a library reimplements some basic functionality of a language is a terrible code smell and just invites to run. However, this is LLVM we're talking of: the guys are working on a C++ compiler and a C++ runtime. If they don't know what they're doing, I'm pretty much screwed because I prefer clang to the gcc version that ships with Mac OS.

Still, being less experienced than them, I'm left wondering what are the pitfalls of the normal RTTI. I know that it works only for types that have a v-table, but that only raises two questions:

Since you just need a virtual method to have a vtable, why don't they just mark a method as virtual? Virtual destructors seem to be good at this.
If their solution doesn't use regular RTTI, any idea how it was implemented?

LLVM was not a C++ compiler at the time those decisions were made. He/they also chose to reimplement standard library functionality, and for a long time their pseudo-STL was pretty buggy. — Potatoswatter, Feb 27 '11 at 19:02
@Potatoswatter Still, they don't seem to reconsider their choices now that they do make a compiler. — zneak, Feb 27 '11 at 20:03
@Potatoswatter You might be interested in the answer that Chris Lattner (the guy behind LLVM) posted here. — zneak, Feb 28 '11 at 15:09
Kind of weird that you prefer `clang` to `gcc`, considering it is aliased to the clang backend... https://stackoverflow.com/questions/19535422/os-x-10-9-gcc-links-to-clang — Cade Brown, Sep 25 '19 at 03:51
@CadeBrown, back in 2011, Xcode 4 shipped both gcc and Clang. — zneak, Sep 25 '19 at 03:52

score 92 · Accepted Answer · edited Jan 23 '13 at 15:16

There are several reasons why LLVM rolls its own RTTI system. This system is simple and powerful, and described in a section of the LLVM Programmer's Manual. As another poster has pointed out, the Coding Standards raises two major problems with C++ RTTI: 1) the space cost and 2) the poor performance of using it.

The space cost of RTTI is quite high: every class with a vtable (at least one virtual method) gets RTTI information, which includes the name of the class and information about its base classes. This information is used to implement the typeid operator as well as dynamic_cast. Because this cost is paid for every class with a vtable (and no, PGO and link-time optimizations don't help, because the vtable points to the RTTI info) LLVM builds with -fno-rtti. Empirically, this saves on the order of 5-10% of executable size, which is pretty substantial. LLVM doesn't need an equivalent of typeid, so keeping around names (among other things in type_info) for each class is just a waste of space.

The poor performance is quite easy to see if you do some benchmarking or look at the code generated for simple operations. The LLVM isa<> operator typically compiles down to a single load and a comparison with a constant (though classes control this based on how they implement their classof method). Here is a trivial example:

#include "llvm/Constants.h"
using namespace llvm;
bool isConstantInt(Value *V) { return isa<ConstantInt>(V); }

This compiles to:

$ clang t.cc -S -o - -O3 -I$HOME/llvm/include -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -mkernel -fomit-frame-pointer
...
__Z13isConstantIntPN4llvm5ValueE:
    cmpb    $9, 8(%rdi)
    sete    %al
    movzbl  %al, %eax
    ret

which (if you don't read assembly) is a load and compare against a constant. In contrast, the equivalent with dynamic_cast is:

#include "llvm/Constants.h"
using namespace llvm;
bool isConstantInt(Value *V) { return dynamic_cast<ConstantInt*>(V) != 0; }

which compiles down to:

clang t.cc -S -o - -O3 -I$HOME/llvm/include -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -mkernel -fomit-frame-pointer
...
__Z13isConstantIntPN4llvm5ValueE:
    pushq   %rax
    xorb    %al, %al
    testq   %rdi, %rdi
    je  LBB0_2
    xorl    %esi, %esi
    movq    $-1, %rcx
    xorl    %edx, %edx
    callq   ___dynamic_cast
    testq   %rax, %rax
    setne   %al
LBB0_2:
    movzbl  %al, %eax
    popq    %rdx
    ret

This is a lot more code, but the killer is the call to __dynamic_cast, which then has to grovel through the RTTI data structures and do a very general, dynamically computed walk through this stuff. This is several orders of magnitude slower than a load and compare.

Ok, ok, so it's slower, why does this matter? This matters because LLVM does a LOT of type checks. Many parts of the optimizers are built around pattern matching specific constructs in the code and performing substitutions on them. For example, here is some code for matching a simple pattern (which already knows that Op0/Op1 are the left and right hand side of an integer subtract operation):

  // (X*2) - X -> X
  if (match(Op0, m_Mul(m_Specific(Op1), m_ConstantInt<2>())))
    return Op1;

The match operator and m_* are template metaprograms that boil down to a series of isa/dyn_cast calls, each of which has to do a type check. Using dynamic_cast for this sort of fine-grained pattern matching would be brutally and showstoppingly slow.

Finally, there is another point, which is one of expressivity. The different 'rtti' operators that LLVM uses are used to express different things: type check, dynamic_cast, forced (asserting) cast, null handling etc. C++'s dynamic_cast doesn't (natively) offer any of this functionality.

In the end, there are two ways to look at this situation. On the negative side, C++ RTTI is both overly narrowly defined for what many people want (full reflection) and is too slow to be useful for even simple things like what LLVM does. On the positive side, the C++ language is powerful enough that we can define abstractions like this as library code, and opt out of using the language feature. One of my favorite things about C++ is how powerful and elegant libraries can be. RTTI isn't even very high among my least favorite features of C++ :) !

-Chris

Except those aren't equivalent operations. LLVM `isa` doesn't respect inheritance like `dynamic_cast`. A better comparison is `if ( typeid(V) == typeid(ConstantInt *) )`, which GCC maps to a function that I'll assume calls `strcmp` on the mangled typenames. If you want to avoid `strcmp`, then you can assume the compiler doesn't dynamically generate `typeinfo` objects and use `if ( &typeid(V) == &typeid(ConstantInt *) )`, which is in theory unportable but that doesn't have to matter to you. — Potatoswatter, Feb 28 '11 at 18:54
@Potatoswatter very outdated reply but to clarify in case someone comes across this, LLVM's built-in RTTI will can handle straight downcasts through inheritance hierachies, see sample implementation of `classof` [(link)](http://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html) which is the basis of the the type check and casting operations...the reason this particular operation does not is because ConstantInt is a leaf class, which is hardcoded into the its implementation of `classof`. `dynamic_cast` could theoretically be optimized to handle leaf cases too at link time but isn't in practice. — Stephen Lin, Mar 04 '13 at 21:19

score 16 · Answer 2 · edited Jun 20 '20 at 09:12

16

The LLVM coding standards seem to answer this question fairly well:

In an effort to reduce code and executable size, LLVM does not use RTTI (e.g. dynamic_cast<>) or exceptions. These two language features violate the general C++ principle of "you only pay for what you use", causing executable bloat even if exceptions are never used in the code base, or if RTTI is never used for a class. Because of this, we turn them off globally in the code.

That said, LLVM does make extensive use of a hand-rolled form of RTTI that use templates like isa<>, cast<>, and dyn_cast<>. This form of RTTI is opt-in and can be added to any class. It is also substantially more efficient than dynamic_cast<>.

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 27 '11 at 18:41

Jerry Coffin

476,176
80
629
1,111

4

Except that's still hand-wavy. The linker and PGO can still figure out that those things aren't used, so even if there's an impact under some circumstances, is it really meaningful? If they are used, but only seldom, you definitely get the best of both worlds. – Potatoswatter Feb 27 '11 at 19:06
1

@Potatoswatter: it would be difficult, because LLVM and CLang are distributed as a multitude of libraries, and you do not have the clients program... – Matthieu M. Feb 27 '11 at 19:54
2

@Matthieu: If the client's binary distribution is spread over a number of executables, then locality is a write-off. I don't know how relevant code bloat is in that case. It's the client's responsibility to apply PGO on the final monolithic binary. If the organization of LLVM somehow prevents this, that's a bigger problem. – Potatoswatter Feb 27 '11 at 20:34

Mikael Persson · Answer 3 · 2011-02-27T19:36:54.747

Here is a great article on RTTI and why you might need to roll your own version of it.

I'm not an expert on the C++ RTTI, but I have implemented my own RTTI as well because there are definitely reasons why you would need to do that. First, the C++ RTTI system is not very feature-rich, basically all you can do is type casting and getting basic information. What if, at run-time, you have a string with the name of a class, and you want to construct an object of that class, good luck doing this with C++ RTTI. Also, C++ RTTI is not really (or easily) portable across modules (you cannot identify the class of an object that was created from another module (dll/so or exe). Similarly, the C++ RTTI's implementation is specific to the compiler, and it typically is expensive to turn on in terms of additional overhead to implement this for all types. Finally, it is not really persistent, so it can't really be used for file saving/loading for example (e.g. you might want to save the data of an object to a file, but you would also want to save the "typeid" of its class, such that, at load time, you know which object to create in order to load this data, that cannot be done reliably with C++ RTTI). For all or some of these reasons, many frameworks have their own RTTI (from very simple to very feature-rich). Examples are wxWidget, LLVM, Boost.Serialization, etc.. this really isn't that uncommon.

Since you just need a virtual method to have a vtable, why don't they just mark a method as virtual? Virtual destructors seem to be good at this.

That's probably what their RTTI system uses too. Virtual functions are the basis for dynamic binding (run-time binding), and thus, it is basically required for doing any kind of run-time type identification/information (not just required by the C++ RTTI, but any implementation of RTTI will have to rely on virtual calls in one way or another).

If their solution doesn't use regular RTTI, any idea how it was implemented?

Sure, you can look up RTTI implementations in C++. I have done my own and there are many libraries that have their own RTTI as well. It is fairly simple to write, really. Basically, all you need is a means to uniquely represent a type (i.e. the name of the class, or some mangled version of it, or even a unique ID for each class), some sort of structure analogous to type_info that contains all the information about the type that you need, then you need a "hidden" virtual function in each class that will return this type information on request (if this function is overriden in each derived class, it will work). There are, of course, some additional things that can be done, like a singleton repository of all types, maybe with associated factory functions (this can be useful for creating objects of a type when all that is known at run-time is the name of the type, as a string or the type ID). Also, you may wish to add some virtual functions to allow dynamic type casting (usually this is done by calling the most-derived class' cast function and performing static_cast up to the type that you wish to cast to).

actually LLVM's internal RTTI does not use virtual tables: basically each root class of every inheritance hierarchy has a constant integer value that identifies each subclass within that hierachy. — Stephen Lin, Mar 04 '13 at 21:25
@StephenLin Good to know. There are of course varying degrees of custom RTTI implementations, which usually depend on whether you need more (heavier) features than the standard RTTI, or if you need a lighter-weight version (i.e., more minimal features). And yes, there are hand-rolled / light-weight alternatives to relying on a virtual table, and I guess the LLVM team felt the need to use such an alternative. But it's still fundamentally a dynamic-dispatching mechanism, which is normally accomplished with virtual functions, but there are alternatives, of course. — Mikael Persson, Mar 05 '13 at 18:05
of course, it's basically a tag based system. Maybe they should just be using ML instead and make their lives easier :D — Stephen Lin, Mar 05 '13 at 18:07

score 4 · Answer 4 · answered Feb 27 '11 at 19:52

4

The predominant reason is that they struggle to keep memory usage as low as possible.

RTTI is only available for classes which feature at least one virtual method, which means that instances of the class will contain a pointer to the virtual table.

On a 64-bits architecture (which is common today), a single pointer is 8 bytes. Since the compiler instantiate lots of small objects, this adds up pretty quickly.

Therefore there is an ongoing effort to remove virtual functions as much as possible (and practical), and implement what would have been virtual functions with the switch instruction, which has similar execution speed but a significantly lower memory impact.

Their constant worry for memory consumption has paid off, in that Clang consumes significantly less memory than gcc, for example, which is important when you offer the library to clients.

On the other hand, it also means that adding a new kind of node usually results in editing code in a good number of files because each switch need to be adapted (thankfully compilers issue warning if you miss an enum member in a switch). So they accepted to make maintenance a tad more difficult in the name of memory efficiency.

answered Feb 27 '11 at 19:52

Matthieu M.

287,565
48
449
722

So what did they replace the 8-bytes pointer with for their own RTTI? – zneak Feb 27 '11 at 19:54
@zneak: an enum is unlikely to take 8-bytes. Most only take a single byte. That's 7-bytes less per node. A typical compilation allocates a few hundreds of thousands of nodes. Savings are counted in MB. It may not be much, but it adds up. Don't worry though, they use tools such as massif to reduce memory where it counts, and privilege speed over memory in general. – Matthieu M. Feb 27 '11 at 19:56
Are you sure about the enums? It seems that both clang and gcc make `sizeof(enum foo) == 4`, even with enums of just one element (there might be some attribute to set the size to something else though). Also, does that mean I have to change the source code of LLVM if I want to use their RTTI with my classes that are defined outside of LLVM? – zneak Feb 27 '11 at 20:11
If you have a vtable, the RTTI-structure goes into that, which is one per class, not one per object. – Macke Feb 27 '11 at 20:37
@Macke: I never said otherwise, there is one pointer per object still. @zneak: it is an implementation detail, I should have make it clear that I was speaking of the minimum, the only guarantee is that the compiler will allocate at least the necessary number of bits to represent all values. In the LLVM/CLang I seem to recall that they do not store the enum itself, but instead use bit-fields to store its value, thus achieving the desired compression effect. – Matthieu M. Feb 27 '11 at 20:43

What can make C++ RTTI undesirable to use?

4 Answers4

Linked