How is LLVM isa<> implemented?

Question

From http://llvm.org/docs/CodingStandards.html#ci_rtti_exceptions

LLVM does make extensive use of a hand-rolled form of RTTI that use templates like isa<>, cast<>, and dyn_cast<>. This form of RTTI is opt-in and can be added to any class. It is also substantially more efficient than dynamic_cast<>.

How is isa and the others implemented?

cheers. Have you taken a look at the source for the templates? — Will A, May 17 '11 at 23:39
@will: No. However i do know LLVM implements some kind of custom RTTI and uses classof. So instead of reading dozens of code files i rather see a reference manual. -edit- or some kind of overview — , May 17 '11 at 23:44
Most likely some meta-information that is added to each class type. — Xeo, May 17 '11 at 23:44

score 34 · Answer 1 · edited Jul 24 '14 at 21:28

34

First of all, the LLVM system is extremely specific and not at all a drop-in replacement for the RTTI system.

Premises

For most classes, it is unnecessary to generate RTTI information
When it is required, the information only makes sense within a given hierarchy
We preclude multi-inheritance from this system

Identifying an object class

Take a simple hierarchy, for example:

struct Base {}; /* abstract */
struct DerivedLeft: Base {}; /* abstract */
struct DerivedRight:Base {};
struct MostDerivedL1: DerivedLeft {};
struct MostDerivedL2: DerivedLeft {};
struct MostDerivedR: DerivedRight {};

We will create an enum specific to this hierarchy, with an enum member for each of the hierarchy member that can be instantiated (the others would be useless).

enum BaseId {
  DerivedRightId,
  MostDerivedL1Id,
  MostDerivedL2Id,
  MostDerivedRId
};

Then, the Base class will be augmented with a method that will return this enum.

struct Base {
  static inline bool classof(Base const*) { return true; }

  Base(BaseId id): Id(id) {}

  BaseId getValueID() const { return Id; }

  BaseId Id;
};

And each concrete class is augmented too, in this manner:

struct DerivedRight: Base {
  static inline bool classof(DerivedRight const*) { return true; }

  static inline bool classof(Base const* B) {
    switch(B->getValueID()) {
    case DerivedRightId: case MostDerivedRId: return true;
    default: return false;
    }
  }

  DerivedRight(BaseId id = DerivedRightId): Base(id) {}
};

Now, it is possible, simply, to query the exact type, for casting.

Hiding implementation details

Having the users murking with getValueID would be troublesome though, so in LLVM this is hidden with the use of classof methods.

A given class should implement two classof methods: one for its deepest base (with a test of the suitable values of BaseId) and one for itself (pure optimization). For example:

struct MostDerivedL1: DerivedLeft {
  static inline bool classof(MostDerivedL1 const*) { return true; }

  static inline bool classof(Base const* B) {
    return B->getValueID() == MostDerivedL1Id;
  }

  MostDerivedL1(): DerivedLeft(MostDerivedL1Id) {}
};

This way, we can check whether a cast is possible or not through the templates:

template <typename To, typename From>
bool isa(From const& f) {
  return To::classof(&f);
}

Imagine for a moment that To is MostDerivedL1:

if From is MostDerivedL1, then we invoke the first overload of classof, and it works
if From is anything other, then we invoke the second overload of classof, and the check uses the enum to determine if the concrete type match.

Hope it's clearer.

edited Jul 24 '14 at 21:28

Paweł Bylica

3,780
1
31
44

answered May 20 '11 at 07:45

Matthieu M.

287,565
48
449
722

ah ha. Much clearer. This doesnt sound automatic at all. It sounds like a recommended style of implementing your own data when having RTTI off. As least this is very, very efficient. +1 and i'll wait on accepting it in hopes someone working on the project can provide with additional insight – May 21 '11 at 00:44
2

@acidzombie24: Anton is working on both Clang and LLVM (and with considerably more experience than I), while I mainly work on Clang. Perhaps that @Chris Lattner will notice this post and provide his answer ;) It's not automatic at all, which is the point actually, since you don't pay for it as long as you do not use it. However I would not recomment generalizing it. LLVM and Clang are very specific projects with very specific needs, for most projects plain RTTI is really sufficient. – Matthieu M. May 21 '11 at 12:12
Do you actually have to provide a `bool classof(DerivedLeft const*)` when you also provide a `bool classof(Base const*)?` I also wonder why the trivial case of identical (or convertible) types is not handled within the `isa` machinery and why `isa_impl::doit` takes a const reference as the argument but `classof` takes a const pointer. – Stephan Tolksdorf Feb 12 '12 at 15:47
1

Great answer! Except that getValueID is not virtual but the base class has a member of BaseId type that getValueID returns. – Nikola Smiljanić Mar 21 '12 at 22:09
1

@NikolaSmiljanić: uh right, don't know what I was thinking since one of the goals is to be able to get rid of virtual pointers altogether. – Matthieu M. Mar 22 '12 at 07:11
1.A typo in `BaseId(BaseId id): Id(id) {}`,the first `BaseId` should be `Base`. 2.I tried the constructor `MostDerivedL1(): Base(MostDerivedL1Id) {}` however gcc/clang reports _'Base' is not a direct or virtual base of ..._ .3. So this implementation of `isa` only gives the __exact type__ , `isa(derivedInstance)` would be false? – Hongxu Chen Dec 31 '12 at 13:50
1

@HongxuChen: 1. Thanks! 2. My bad, `MostDerivedL1` should inherit from `DerivedLeft`, I correct the definition. 3. No, it should work for a base class too, as derived-to-base conversion will take care of transforming a `DerivedLeft*` to a `Base*`, for example, and then `Base::classof(Base const*)` will be used (and return `true`). I will add the `classof` implementation in `Base`, it's currently missing. – Matthieu M. Dec 31 '12 at 15:30
Many thanks! However I still have another question. For instance there is such a piece of code: `MostDerivedR dr;Base *pb = &dr;DerivedRight *pr = &dr;`, then `isa(*pb)` would be _false_ and `isa(*pr)` would be _true_, which seems a bit odd(I guess the __concrete__ class `DerivedRight` should also has its own default `DerivedRightId` when constructed). – Hongxu Chen Jan 01 '13 at 12:05
@HongxuChen: `DerivedRight` does have a `DerivedRightId` setup in its constructor, and both should return `true`; patching up the typos, [you can see a live demo here](http://liveworkspace.org/code/3cC35q$1) – Matthieu M. Jan 01 '13 at 12:37
@HongxuChen: Thanks for your feedback, it prompted me to fix the many typos this answer had :) – Matthieu M. Jan 01 '13 at 14:34

score 6 · Answer 2 · answered May 18 '11 at 06:47

6

Just adding stuff to osgx's answer: basically each class should implement classof() method which does all the necessary stuff. For example, the Value's classof() routine looks like this:

  // Methods for support type inquiry through isa, cast, and dyn_cast:
  static inline bool classof(const Value *) {
    return true; // Values are always values.
  }

To check whether we have a class of the appropriate type, each class has it's unique ValueID. You can check the full list of ValueID's inside the include/llvm/Value.h file. This ValueID is used as follows (excerpt from Function.h):

  /// Methods for support type inquiry through isa, cast, and dyn_cast:
  static inline bool classof(const Function *) { return true; }
  static inline bool classof(const Value *V) {
    return V->getValueID() == Value::FunctionVal;
  }

So, in short: every class should implement classof() method which performs the necessary decision. The implementation in question consists of the set of unique ValueIDs. Thus in order to implement classof() one should just compare the ValueID of the argument with own ValueID.

If I remember correctly, the first implementation of isa<> and friends were adopted from boost ~10 years ago. Right now the implementations diverge significantly :)

answered May 18 '11 at 06:47

Anton Korobeynikov

9,074
25
28

3

One note: this system works great for LLVM because it is self-contained. It is not extensible to arbitrary code because one would have issues with coordinating the attributions of `ValueIDs` between the DLLs. – Matthieu M. May 18 '11 at 07:27
@Matthieu: yes, this is correct. Though I don't see any problem if one would somehow maintain the list of ValueIDs consistent across all the DLLs. Also, one can use other stuff as value ID, e.g. the address of some static class memeber function, this will be unique across different DLLs – Anton Korobeynikov May 18 '11 at 15:20
@Anton: Indeed the address of a class member would probably be a great way to get this behavior :) (by DLLs I meant DLLs provided by various externals sources). – Matthieu M. May 18 '11 at 17:25
Theres a few things i am still confused about. Such as would this still work if it got typecast away (ie i have void*, is there a way to compare it to KnownClass*?). If i check a derived and base do i get false? Am i implementing getValueID myself? and `return V->getValueID() == Value::FunctionVal;` looks like nonsense. How is that even right? Its comparing to an enum... The enum is not unique and i dont know why that code is there. I still +1 – May 18 '11 at 18:03
@acidzombie24: if you have a `void*`, you lost all type information, so it's hard to reliably guess the type (you could maintain a map void* -> type for example). For base and derived, you have to check both cases, it's an enum, thus not polymorphic. The `enum` is unique within a given hierarchy, it's not suitable for multiple inheritance. `V->getValueID()` is a virtual call, thus you need to check which of the derived class got instantiated. – Matthieu M. May 19 '11 at 06:15
@Matthieu: I'm confused. If i need to do a check if its a based or derived that would make it much less useful. Almost useless if i need to check if types from a plugin is derived from my base class bc i wouldnt know the types. Moving on, how is the virtual call helping me? that seems key. http://llvm.org/doxygen/Value_8h_source.html#l00244 this shows the implementation returning unsigned char SubclassID; which is not unique enough. so.... I still dont understand. Maybe the implementation is not yet ready at the moment? – May 19 '11 at 19:05
@acidzombie24: it is fully ready, and very much used. The virtual call is key because you want the ID from the most derived class. If you want to act at an intermediary level, then you need to know all possible subclasses of the class you want. It's not such an issue in LLVM because the class hierarchy are usually shallow (2 or 3 levels) and the situation usually makes it clear what kind of class may be returned. But once again, you need to realize that this system has been designed *specifically* for LLVM needs. It's not surprising if it does not fit in your code/usecase. – Matthieu M. May 20 '11 at 06:11
@Matthieu actually i dont understand how a char could represent a class. There would only be 256 possibilities. I understand that virtual gets the most derived class, but i dont see how an enum can work with this at all. Also i don't see how i can check if a obj is a base of something but i am sure thats somewhere in there (unless it isnt which would be weird?) – May 20 '11 at 07:02
2

@acidzombie24: okay, I'll make an answer of my own because comments are just too restricted for this :p – Matthieu M. May 20 '11 at 07:17

score 5 · Answer 3 · answered May 18 '11 at 02:10

I should mention that http://llvm.org/docs/ProgrammersManual.html#isa - this document have some additional description.

The source code of isa, cast and dyn_cast is located in single file, and commented a lot.

http://llvm.org/doxygen/Casting_8h_source.html

00047 // isa<X> - Return true if the parameter to the template is an instance of the
00048 // template type argument.  Used like this:
00049 //
00050 //  if (isa<Type*>(myVal)) { ... }
00051 //
00052 template <typename To, typename From>
00053 struct isa_impl {
00054   static inline bool doit(const From &Val) {
00055     return To::classof(&Val);
00056   }
00057 };

00193 // cast<X> - Return the argument parameter cast to the specified type.  This
00194 // casting operator asserts that the type is correct, so it does not return null
00195 // on failure.  It does not allow a null argument (use cast_or_null for that).
00196 // It is typically used like this:
00197 //
00198 //  cast<Instruction>(myVal)->getParent()
00199 //
00200 template <class X, class Y>
00201 inline typename cast_retty<X, Y>::ret_type cast(const Y &Val) {
00202   assert(isa<X>(Val) && "cast<Ty>() argument of incompatible type!");
00203   return cast_convert_val<X, Y,
00204                           typename simplify_type<Y>::SimpleType>::doit(Val);
00205 }

00218 // dyn_cast<X> - Return the argument parameter cast to the specified type.  This
00219 // casting operator returns null if the argument is of the wrong type, so it can
00220 // be used to test for a type as well as cast if successful.  This should be
00221 // used in the context of an if statement like this:
00222 //
00223 //  if (const Instruction *I = dyn_cast<Instruction>(myVal)) { ... }
00224 //
00225 
00226 template <class X, class Y>
00227 inline typename cast_retty<X, Y>::ret_type dyn_cast(const Y &Val) {
00228   return isa<X>(Val) ? cast<X, Y>(Val) : 0;
00229 }

How is LLVM isa<> implemented?

3 Answers3

Linked