17

It just happens to me about one code design question. Say, I have one "template" method that invokes some functions that may "alter". A intuitive design is to follow "Template Design Pattern". Define the altering functions to be "virtual" functions to be overridden in subclasses. Or, I can just use delegate functions without "virtual". The delegate functions is injected so that they can be customized too.

Originally, I thought the second "delegate" way would be faster than "virtual" way, but some coding snippet proves it is not correct.

In below code, the first DoSomething method follows "template pattern". It calls on the virtual method IsTokenChar. The second DoSomthing method doesn't depend on virtual function. Instead, it has a pass-in delegate. In my computer, the first DoSomthing is always faster than the second. The result is like 1645:1780.

"Virtual invocation" is dynamic binding and should be more time-costing than direct delegation invocation, right? but the result shows it is not.

Anybody can explain this?

using System;
using System.Diagnostics;

class Foo
{
    public virtual bool IsTokenChar(string word)
    {
        return String.IsNullOrEmpty(word);
    }

    // this is a template method
    public int DoSomething(string word)
    {
        int trueCount = 0;
        for (int i = 0; i < repeat; ++i)
        {
            if (IsTokenChar(word))
            {
                ++trueCount;
            }
        }
        return trueCount;
    }

    public int DoSomething(Predicate<string> predicator, string word)
    {
        int trueCount = 0;
        for (int i = 0; i < repeat; ++i)
        {
            if (predicator(word))
            {
                ++trueCount;
            }
        }
        return trueCount;
    }

    private int repeat = 200000000;
}

class Program
{
    static void Main(string[] args)
    {
        Foo f = new Foo();

        {
            Stopwatch sw = Stopwatch.StartNew();
            f.DoSomething(null);
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }

        {
            Stopwatch sw = Stopwatch.StartNew();
            f.DoSomething(str => String.IsNullOrEmpty(str), null);
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
Morgan Cheng
  • 73,950
  • 66
  • 171
  • 230

7 Answers7

21

Think about what's required in each case:

Virtual call

  • Check for nullity
  • Navigate from object pointer to type pointer
  • Look up method address in instruction table
  • (Not sure - even Richter doesn't cover this) Go to base type if method isn't overridden? Recurse until we find the right method address. (I don't think so - see edit at bottom.)
  • Push original object pointer onto stack ("this")
  • Call method

Delegate call

  • Check for nullity
  • Navigate from object pointer to array of invocations (all delegates are potentially multicast)
  • Loop over array, and for each invocation:
    • Fetch method address
    • Work out whether or not to pass the target as first argument
    • Push arguments onto stack (may have been done already - not sure)
    • Optionally (depending on whether the invocation is open or closed) push the invocation target onto the stack
    • Call method

There may be some optimisation so that there's no looping involved in the single-call case, but even so that will take a very quick check.

But basically there's just as much indirection involved with a delegate. Given the bit I'm unsure of in the virtual method call, it's possible that a call to an unoverridden virtual method in a massively deep type hierarchy would be slower... I'll give it a try and edit with the answer.

EDIT: I've tried playing around with both the depth of inheritance hierarchy (up to 20 levels), the point of "most derived overriding" and the declared variable type - and none of them seems to make a difference.

EDIT: I've just tried the original program using an interface (which is passed in) - that ends up having about the same performance as the delegate.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • On virtual calls there's no check for null. Also, the vtable of the methods is determined at compile time, so there's no run-time recursion over base classes. The compiler generates the pointer to the right method from the proper base class and puts it in the proper slot in the vtable. – Franci Penov Oct 19 '08 at 20:15
  • 6
    callvirt *does* check for null - see section 4.2 of partition 3 in the CLI spec, or P166 of CLR via C#. (If the reference were null, which implementation would be called?) Thanks for the confirmation of the "no recursion" bit though. That's what the experiment was basically suggesting. – Jon Skeet Oct 19 '08 at 20:32
  • 3
    +1 for the no recursion, vtables are flattened at compile type. – thinkbeforecoding Apr 16 '09 at 08:44
14

Just wanted to add a few corrections to john skeet's response:

A virtual method call does not need to do a null check (automatically handled with hardware traps).

It also does not need to walk up inheritance chain to find non-overriden methods (that's what the virtual method table is for).

A virtual method call is essentially one extra level of indirection when invoking. It is slower than a normal call because of the table look-up and subsequent function pointer call.

A delegate call also involves an extra level of indirection.

Calls to a delegate do not involve putting arguments in an array unless you are performing a dynamic invoke using the DynamicInvoke method.

A delegate call involves the calling method calling a compiler generated Invoke method on the delegate type in question. A call to predicator(value) is turned into predicator.Invoke(value).

The Invoke method in turn is implemented by the JIT to call the function pointer(s) (stored internally in the delegate object).

In your example, the delegate you passed should have been implemented as a compiler generated static method as the implementation does not access any instance variables or locals so therefore the need to access the "this" pointer from the heap should not be an issue.

The performance difference between delegate and virtual function calls should be mostly the same and your performance tests show that they are very close.

The difference could be due to the need to additional checks+branches because of multicast (as suggested by John). Another reason could be that the JIT compiler does not inline the Delegate.Invoke method and the implementation of Delegate.Invoke does not handle arguments as well as the implementation when performming virtual method calls.

9

A virtual call is dereferencing two pointers at a well-known offset in the memory. It's not actually dynamic binding; there is no code at runtime to reflect over the metadata to discover the right method. The compiler generates couple of instructions to do the call, based on the this pointer. in fact, the virtual call is a single IL instruction.

A predicate call is creating an anonymous class to encapsulate the predicate. That class has to be instantiated and there is some code generated to actually check whether the predicate function pointer is null or not.

I would suggest you look at the IL constructs for both. Compile a simplified version of your source above with a single call to each of the two DoSomthing. Then use ILDASM to see what is the actual code for each pattern.

(And I am sure I'll get downvoted for not using the right terminology :-))

Franci Penov
  • 74,861
  • 18
  • 132
  • 169
  • In my understanding, "the actual method to invoke is decided at runtime" is called "dynamic binding". – Morgan Cheng Oct 19 '08 at 13:22
  • A Predicate is just a delegate instance. It is a instance, but I'm not so sure there is a "anonymous class" created. Is there "anonymous class" concept in C#/.NET? – Morgan Cheng Oct 19 '08 at 13:23
  • A virtual call is not "deciding which method to invoke at runtime". the method to invoke is well known and already associated with the class. – Franci Penov Oct 19 '08 at 20:10
  • 1
    When I say "anonymous class", I meant a helper class generated by the compiler to wrap the delegate call. There is a separate concept of anonymous classes, introduced with LINQ. – Franci Penov Oct 19 '08 at 20:12
  • 2
    This answer contains a fundamental flaw: IL code generated by the C# compiler won't tell you how fast the code will run; the assembly code output by the JIT would be a more reliable way of gauging execution speed. This is because (1) IL is based on an abstract stack machine that is most likely different from the underlying (typically register-based) computer architecture, and therefore IL has to be transformed quite heavily before it can be executed; and because (2) optimizations are apparently performed mostly after, not before IL (i.e. by the JIT, not by the C# compiler). – stakx - no longer contributing Jan 25 '15 at 12:48
  • (Regarding that final claim in the above comment, see Eric Lippert's blog article, ["What does the optimize switch do?"](http://blogs.msdn.com/b/ericlippert/archive/2009/06/11/what-does-the-optimize-switch-do.aspx)) – stakx - no longer contributing Jan 25 '15 at 12:54
  • @stakx of course, looking at the assembly code also tells you nothing about the perf comparison, given that you have zero knowledge about how the transistors in the CPU will be triggered by the different instructions, and whether two different instructions will be even executed by the integer or floating point unit (if the CPU has one), or might even be early executed/skipped by the predictive branching logic (if the CPU supports one), or even whether the instructions will be executed on the CPU or passed through to the host CPU (if running the code in virtual machine). – Franci Penov Jan 26 '15 at 08:15
  • It's also unclear what the perf implications would be if the instructions happen to be executed through light switching, instead of electron gates. Not to mention that it's completely unclear whether the computation will be calculated by bits, or qubits, which might get totally different performance profile. So, yeah, pretty much every single performance-related answer on SO happens to have exactly the same fundamental flaw. – Franci Penov Jan 26 '15 at 08:16
  • @FranciPenov: You are correct. That's why I only called looking at assembly code "[a] *more reliable* [way of gauging execution speed]". With assembly code, the relationship to what actually gets executed is far closer to a being a 1:1 relationship than with IL; that's the important bit in my comments. Concerning quantum computers: let's stay practical; they are not exactly seeing mainstream use yet. But yes, my comments might be completely wrong in 10-20 years' time. – stakx - no longer contributing Jan 26 '15 at 09:37
3

Test result worth 1000 of words: http://kennethxu.blogspot.com/2009/05/strong-typed-high-performance_15.html

Kenneth Xu
  • 1,514
  • 15
  • 15
1

It is possible that since you don't have any methods that override the virtual method that the JIT is able to recognize this and use a direct call instead.

For something like this it's generally better to test it out as you have done than try to guess what the performance will be. If you want to know more about how delegate invocation works, I suggest the excellent book "CLR Via C#" by Jeffrey Richter.

Daniel Plaisted
  • 16,674
  • 4
  • 44
  • 56
  • Not only that, but if storing/calling delegates were more efficient than virtual dispatch, the framework would almost certainly store and call delegates instead of using virtual dispatch. Given that it doesn't, it probably isn't (except in rare/pathological cases). – technophile Oct 19 '08 at 04:45
  • I don't think it make difference the method is "virtual" in parent class or "override" in subclass. If it is "virtual", it is always dynamic binding. – Morgan Cheng Oct 19 '08 at 05:28
1

I doubt it accounts for all of your difference, but one thing off the top of my head that may account for some of the difference is that virtual method dispatch already has the this pointer ready to go. When calling through a delegate the this pointer has to be fetched from the delegate.

Note that according to this blog article the difference was even greater in .NET v1.x.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
0

virtual overrides have some sort of redirection table or something which is hardcoded and fully optimized at compile time. It's set in stone, very fast.

Delegates are dynamic which will always have an overhead and they seem to be objects too so that adds up.

You shouldn't worry about these small performance differences (unless developing performance critical software for the military), for most purposes good code structure wins over optimization.

Yes Fish...
  • 541
  • 3
  • 4