In C++, is a "public unsigned int type" in base class, a faster alternative to dynamic_cast?

Question

In C++, "dynamic_cast" being slow is a known fact. I thought of following simple way of knowing the type of an object in a hierarchy. Could someone please explain if this could be slower than dynamic_cast? And if not, then why isn't it a common practice, given that speed is the worst drawback of C++ over C?

struct Base {
  unsigned m_type;
  Base(unsigned type): m_type(type) {}
  Base(): m_type(0) {}
};
struct Derived1: Base {
  Derived1(): Base(1) {}
  Derived1(int type): Base(type) {}
};
struct Derived2: Base {
  Derived2(): Base(2) {}
};
struct Derived3: Derived1 {
  Derived3(): Derived1(3) {}
};
void my_func(Base * p) {
  if (p - > m_type == 0) {} 
  else if (p - > m_type == 1) {} 
  else if (p - > m_type == 2) {} 
  else if (p - > m_type == 3) {}
}

""dynamic_cast" being slow is a known fact" can you give some reference? Its completely new to me that `dynamic_cast` is known to be slow — 463035818_is_not_an_ai, Sep 25 '18 at 13:56
Profile it and find out. Normally dynamic cast is a design flaw though. You can normally refactor the code so you don't have to use it. — NathanOliver, Sep 25 '18 at 13:56
" speed is the worst drawback of C++ over C" yet another claim that I find rather surprising. Where did you get that from? — 463035818_is_not_an_ai, Sep 25 '18 at 13:57
Sigh... Its *"slowness"* is only an issue when people abuse it. The fact you think you need a home-brewed version is a strong indication your code could use some re-factoring to eliminate a bunch of casts. — StoryTeller - Unslander Monica, Sep 25 '18 at 13:57
It may be difficult to benchmark this. You need to be running optimized build however you need to make sure your code is not optimized out. And then this will be soo fast that you need to many calls to get an accurate timing. — drescherjm, Sep 25 '18 at 13:58
btw there is no public static enum in your code. I could agree with the "public", but its neither static nor an enum...I think I know what you mean, but imho putting it inside "" is not enough to avoid the confusion — 463035818_is_not_an_ai, Sep 25 '18 at 14:00
Why is it that this can't be implemented as a `virtual` function? — François Andrieux, Sep 25 '18 at 14:01
I'm voting to close this question as off-topic because any answer will very likely be very speculative. This is something that needs to be bench-marked, not asked about on SO. — StoryTeller - Unslander Monica, Sep 25 '18 at 14:03
https://stackoverflow.com/questions/4050901/performance-of-dynamic-cast — drescherjm, Sep 25 '18 at 14:03
"Knowing the type" is not the same as getting a pointer or reference to the derived type from a pointer or reference to the base type. So, yes, your version is undoubtedly faster, because it doesn't do as much. — Pete Becker, Sep 25 '18 at 14:04
there are two massive drawbacks of your appraoch, 1) if you make a typo then two classes have the same `m_type` 2) `m_type` isnt really static, but each instance has that member, for small classes this is a problem — 463035818_is_not_an_ai, Sep 25 '18 at 14:04
I have to say I completely disagree with the downvotes this question is getting. Now, it's true that OP's _suggestion_ is to be rejected, but OP's _question_ as such is fine IMO. — einpoklum, Sep 25 '18 at 14:09
dynamic_cast was slow 30 years ago when c++ compilers were new. This has not been true for 25 years. Nowadays the compiler generates very optimal code and I would be surprised if you could write a faster version manually. As a side note. The g++ compiler was slow as the cast operation was doing a class name string comparison at each level of class hierarchy as it tried to find the correct type record (this was fixed prior to version 3.0). — Martin York, Sep 25 '18 at 14:15
My question is will an extra 20ns (or similar) per usage cause your application a bottleneck? — drescherjm, Sep 25 '18 at 14:20
Normally you can delete your own questions. Not sure what happens if it is answered however. — drescherjm, Sep 25 '18 at 14:29
Actually I don't think its a bad question, even though some of the assumptions in the text are wrong. And @einpoklum gave a good answer. — drescherjm, Sep 25 '18 at 14:30
Yes, need to be careful with my words. Too assertive and subjective statements. May be just stick to more code and less words. Yes, I got the answer. Was grappling with this thought for many days! — Makarand Kokane, Sep 25 '18 at 14:33
@drescherjm "_Normally you can delete your own questions. Not sure what happens if it is answered however._" If the answer, on a question, contains an answer with positive score - one cannot delete the question. — Algirdas Preidžius, Sep 25 '18 at 14:37
Thanks, I was on one hand a little upset that the question would be deleted and on the other understanding the reason for deletion. — drescherjm, Sep 25 '18 at 15:02
Please do not delete the question. This is a reasonable question for someone new to C++ or with very outdated understanding of C+++, and had very good answers. — Eljay, Sep 25 '18 at 15:32
If you do know all the derived classes when you are writing the base class, it's possible that a variant would be a better choice: `std::variant` (or `boost::variant ` before C++17) — Justin, Sep 25 '18 at 16:14
@MakarandKokane: don't delete the question. Almost all the people wrong here. `dynamic_cast` is (still) slow. Much slower than comparing an `int` in the class. Not to mention, if you need a switch on a type, `m_type` based solution will be much-much faster (because m_type need to queried once, but `dynamic_cast` need to be executed for each type). — geza, Sep 25 '18 at 19:28

einpoklum · Answer 1 · 2018-09-25T14:16:36.063

Could someone please explain if this could be slower than dynamic_cast?

It might well be slower than dynamic_cast'ing, when ...

the compiler can figure out what you're dynamic_cast'ing from and thus avoid doing anything for the dynamic cast.
the Base-derived objects are immutable, allowing for all sorts of optimization, while your classes contain a mutable type variable.
you have a large and complex hierarchy of types, so you have lots of comparisons.

plus, have you actually checked to see how slow dynamic_cast is relative to other things you might be doing?

Why isn't it a common practice [?]

Because it makes the code more complicated, adds unintended potential semantics and features, and goes against the grain of the language's abstraction mechanisms.
Because it necessitates the base class knowing about all of its derived classes (or otherwise you essentially end up reimplementing dynamic casting).
Because it makes some parts of the class-specific code local to the class definition, and other parts of the class-specific code local to the base class.

... given that speed is the worst drawback of C++ over C?

It isn't. It is quite possible (and common) these days to write better-performing code in C++ than the semantic equivalent in C.

But that should not matter anyway, since you have no business doing dynamic-cast'ing in performance-critical code; that's a design flaw, as @NathanOliver suggests in a comment.

thank you! I got the answer. Larger hierarchy could lead to lot more comparisons, and a custom mechanism doesn't allow the compiler to optimize away dynamic_casts. — Makarand Kokane, Sep 25 '18 at 14:35
Actually large hierarchy doesn't matter. A single dynamic_cast would be replaced by a single interview comparison. — Makarand Kokane, Sep 26 '18 at 01:41

Martin York · Answer 2 · 2018-09-25T19:04:39.553

3

In C++, "dynamic_cast" being slow is a known fact.

It was a known fact back thirty years ago (when I first started using C++ compilers) 1989/1990 (back then I used whatever compiler was installed by default on the sun workstations). BUT that is no longer true for modern compilers.

I remember downloading g++ (2.4 I think; approx 1996-1998) to look at what was happening (see below). The compiler (g++) would traverse the class hierarchy and at each level do a class name string comparison as it searched for the correct type record. This was done as each compilation unit potentially had its own unique set of type record (so you could not simply compare pointers).

And if not, then why isn't it a common practice

As a result a lot of people actually did write their own versions of dynamic cast to improve speed (but then again people were still trying to work out how to write good C++ in those days and most code was written like C so performance was probably not the fault of dynamic_cast). But the couple of implementations I saw were exceedingly brittle and subject to a lot of bugs. Which is why I was looked at how g++ worked. By this time (1998) the self written dynamic cast my company used was showing its age and I ripped it out to use the compiler version (no measurable performance change, but lots of mysterious bugs disappeared).

Thankfully this was fixed a long time ago. Before the end of the last century (I can not give you an exact date). But definitely prior to g++ 3.0.

I thought of following simple way of knowing the type of an object in a hierarchy.

You have identified the class by a number. This is not the same as a cast. You still need to convert the pointer to the correct location.

Could someone please explain if this could be slower than dynamic_cast?

I would suspect that you would have a hard time writing a version faster than the current implementation. Especially taking into account all the special corner cases with virtual inheritance and such. To be honest I doubt you could do a faster one for even a simple hierarchy like you show. The compiler has 30 years of optimization tricks applied to it. These tricks are applied at compile time your result is run time.

given that speed is the worst drawback of C++ over C?

I find this hard to believe. Do you have a citation?

edited Sep 25 '18 at 19:04

answered Sep 25 '18 at 14:37

Martin York

257,169
86
333
562

Thank you Martin! While I got my answers, I'm sorry for some loose statements there which I didn't focus much on while typing the questions. – Makarand Kokane Sep 25 '18 at 14:44
Actually large hierarchy doesn't matter. A single dynamic_cast would be replaced by a single interview comparison. – Makarand Kokane Sep 26 '18 at 01:41
1

@MakarandKokane. I presume you mean `integer` comparison. Sure But that still only tells you the type its not the same as a dynamic cast. Given the compilers ability to potentially do this at compile time compared with runtime you are going to loose out on many times on that. This solution is also very brittle and likely to have errors (the compiler will not have errors). All in all this is a bad idea and will always be done better by the compiler (and its still unlikely you would ever beat the compiler version). – Martin York Sep 26 '18 at 01:52
Thank you Martin for the detailed reply and your time. Agreed, another C-style cast or reinterpret_cast would be needed after the integer comparison. Yes, surely compiler optimization would be missed. May be compiler might be doing something similar under the hood on seeing a dynamic_cast? A single int comparison got to be faster than vtable lookup? Off course, excluding the cases where compiler is able to simply optimize away the lookup completely. Other 2 points are taken: C++ may not be faster than C, and need for use of dynamic_cast should be looked at first for design flaws. – Makarand Kokane Sep 26 '18 at 06:10
1

@MartinYork: `dynamic_cast` is much slower than integer comparison. It is very easy to beat the compiler version in a lot of circumstances. `dynamic_cast` needs to do a lot of things, it needs to handle all situations. In constrained cases (like no cross cast in multiple inheritance), one can create an own `dynamic_cast`, which is at least 10x faster than the compiler provided one. Plus: if the compiler can determine the type at `dynamic_cast` compile time, then it can determine `m_type` as well (in a quality implementation). – geza Sep 26 '18 at 09:09
@MartinYork: So, rolling an own rtti system (which a little bit more constrained, but 10x faster than `dynamic_cast`) is not a bad idea at all, and it is not true that "always be done better by the compiler". – geza Sep 26 '18 at 09:09
@geza Having lived through the era when people tried it the first time I have to disagree. The consequence is always brittle code that introduces bugs. If you think you can write code that is better than the compiler (after people have been optimizing the compiler for over 30 years please go right ahead), I think you will find that one of your internal optimizations has broken something that you forgot about. In a simple situation the compiler will generate just as simple code as you do. – Martin York Sep 26 '18 at 16:03
I didn't say that I can write better code than what's in the compiler. I say that **if** the case is restricted (like no cross cast in multiple inheritance), then it is possible to create rtti which is much faster than what the compiler provides. And this is a fact, I've such a system in place already, and it is generally 20x faster than `dynamic_cast`. "In a simple situation the compiler will generate just as simple code as you do": this is simply untrue. With current compilers, `dynamic_cast` will boil down to the same function call almost always. So it is almost never a cheap call. – geza Sep 26 '18 at 16:36
@geza I would have to see your code. The code above is 1) not even close to what you describe. I have not looked at compiler dynamic case implementation in 20 years but it was pretty darn good back then (I doubt it has got worse in that time frame). So I am still skeptical about your claims. – Martin York Sep 26 '18 at 16:58
**BUT** we have to look at why you need dynamic cast in the first place. In most well designed applications its not needed. So this is implying there is something wrong (or you are doing something very non standard) with the design in the first place, which suggests the code is already non optimal (or you have made it convoluted for some micro optimization). Either way its still a bad idea. Any time you gain will be completely wiped out by developer time required to maintain the brittle code. – Martin York Sep 26 '18 at 16:58
But please provide a link to this code. Lets see it. Lets review it on https://codereview.stackexchange.com – Martin York Sep 26 '18 at 16:59
1

@MartinYork: This discussion is not about why I need `dynamic_cast`. This is about the speed of it. Sure, `dynamic_cast`'s implementation is good. Perhaps. But again, I talk about a constrained case. Like OP's example. An `m_type` based solution. Do you really think, that a switch on `m_type` is not much faster than a whole `dynamic_cast`? Which needs to analyze type information, etc.? Reading `m_type` is just a memory read (inlined), and the `switch` can jump to the right place. Compare this to `dynamic_cast` (a function call), and the work it needs to do the cast properly. – geza Sep 26 '18 at 17:08
@MartinYork: I cannot share (I don't want to, to be honest) my source code, as it is big, and part of a larger framework. It does a lot more, not just `dynamic_cast`. But, maybe I'll just put together a little standalone benchmark, to show that `dynamic_cast` is "slow". As far as I remember, `dynamic_cast` took in the order of 10ns. And a simple `m_type` based system should take less than 1ns. Approximately. – geza Sep 26 '18 at 17:18
1

@geza Sure you can compare two things that do completely different things and compare them and the one that does nothing can definitely be ten times faster than one that does something. That's easy. If you can time something that takes 10ns accurately I would love to see your benchmarks. Lets review the code to see where it is going wrong. https://codereview.stackexchange.com – Martin York Sep 26 '18 at 18:28
Completely different things? Are you serious? "Does nothing"? For OP's problem, they does the same. And with some trickery, `m_type` can be extended to cover almost all features of `dynamic_cast`, and yet much faster. – geza Sep 26 '18 at 18:47
Check out this little snippet: `struct A { virtual ~A() { } }; struct B { }; int main() { A *volatile a = new A; int sum = 0; for (int i=0; i<1000000000; i++) sum += (bool)dynamic_cast(a); return sum;}` It runs in 8 seconds in my machine, so it means ~8ns for `dynamic_cast`. And just to compare, if I replace the dynamic cast to `sum += `, it runs in 0.2 sec. So `dynamic_cast` makes this program much slower (40x). – geza Sep 26 '18 at 18:48
@geza lol I don't think we need to review code. That's just so good by itself. Made my day thanks for the laugh. – Martin York Sep 26 '18 at 18:59
@MartinYork: care to explain? – geza Sep 26 '18 at 19:01
As I said above doing nothing is always much faster than doing something. If you want to show me something then write it and put it on https://codereview.stackexchange.com lets get it reviewed. This talking in comments is getting us nowhere. You are making arguments about hypotheticals code you can't or wont share or snippet straw man code examples that can't be tested and seems very much like broken code. But lets get it reviewed. Lets have a real discussion. Put the code on https://codereview.stackexchange.com/ – Martin York Sep 26 '18 at 19:09
@MartinYork: Can't be tested? Copy it into a file, compile and run it. You've said: "If you can time something that takes 10ns accurately I would love to see your benchmarks". There you have it. That little snippet does exactly that. Please, check it out. And after that, if you did, and you still think that my little snippet is wrong, then I promise I'll put it into codereview. But I don't want to put it into codereview for nothing. (I.e., you haven't really checked it, as "can't be tested" is clearly wrong). – geza Sep 26 '18 at 19:16
@geza Now your just trolling (or trying to make me laugh). There is no point continuing here in the comments. I await some real code in https://codereview.stackexchange.com so we can continue the discussion there. PS. Your code is UB so completely useless for anything; I am actually surprised it compiles. But it definitely does not show anything meaningful. – Martin York Sep 26 '18 at 19:33
https://codereview.stackexchange.com/questions/204426/does-this-little-snippet-measure-speed-of-dynamic-cast-correctly – geza Sep 26 '18 at 19:45
Where is the UB? – geza Sep 26 '18 at 22:04
@geza, its surprising that the compiler didn't optimize it in some way to give similar performance! Sometimes simple solutions are very difficult to accept. An exposed public member variable is a "turn-off" for good developers. It's unconventional. It should be made as const member for little wider acceptance. A mechanism could be put in place to generate unique Ids. – Makarand Kokane Sep 27 '18 at 05:08
1

@MakarandKokane: I think that it must not optimize it, see my comment below Martin's answer on codereview. "A mechanism could be put in place to generate unique Ids.": Sure. In my framework, I have such a system in place. It is not difficult to write. It is possible to create an rtti system, which is more advanced than C++'s one: much faster (if cross cast is not used), helps in (de)serializing objects, etc. – geza Sep 27 '18 at 09:50

In C++, is a "public unsigned int type" in base class, a faster alternative to dynamic_cast?

2 Answers2