c# Generics - unexpected performance results

Question

I believe Microsoft claims that generics is faster than using plain polymorphism when dealing with reference types. However the following simple test (64bit VS2012) would indicate otherwise. I typically get 10% faster stopwatch times using polymorphism. Am I misinterpreting the results?

public interface Base { Int64 Size { get; } }
public class Derived : Base { public Int64 Size { get { return 10; } } }

public class GenericProcessor<TT> where TT : Base
{
    private Int64 sum;
    public GenericProcessor(){ sum = 0; }
    public void process(TT o){ sum += o.Size; }
    public Int64 Sum { get { return sum; } }
}
public class PolymorphicProcessor
{
    private Int64 sum;
    public PolymorphicProcessor(){ sum = 0; }
    public void process(Base o){ sum += o.Size; }
    public Int64 Sum { get { return sum; } }
}
static void Main(string[] args)
{
    var generic_processor = new GenericProcessor<Derived>();
    var polymorphic_processor = new PolymorphicProcessor();
    Stopwatch sw = new Stopwatch();
    int N = 100000000;
    var derived = new Derived();

    sw.Start();
    for (int i = 0; i < N; ++i) generic_processor.process(derived);
    sw.Stop();
    Console.WriteLine("Sum ="+generic_processor.Sum + " Generic performance = " + sw.ElapsedMilliseconds + " millisec");

    sw.Restart();
    sw.Start();
    for (int i = 0; i < N; ++i) polymorphic_processor.process(derived);
    sw.Stop();
    Console.WriteLine("Sum ="+polymorphic_processor.Sum+ " Poly performance = " + sw.ElapsedMilliseconds + " millisec");

Even more surprising (and confusing) is that if I add a type cast to the polymorphic version of processor as follows, it then runs consistently ~20% faster than the generic version.

        public void process(Base trade)
        {
            sum += ((Derived)trade).Size; // cast not needed - just an experiment
        }

What's going on here? I understand generics can help avoid costly boxing and unboxing when dealing with primitive types, but I'm dealing strictly with reference types here.

Can you post a reference to such claim that generics are faster for reference types? — Wiktor Zychla, Dec 12 '13 at 21:55
Generics are faster with `struct`s, because they allow you to avoid boxing/unboxing (you'd have to declare your parameter as `object` to make it work with both `int` and `long` e.g.). I know nothing about generics being faster with reference types. Could you provide any source about that? — MarcinJuraszek, Dec 12 '13 at 21:55
Benchmarking microoptimizations is *very* hard. There are *lots* of pitfals that you can come across when trying to benchmark operations such as this. As a rule, unless you're very experienced at it, the odds are very high that your results are almost entirely representing flaws in how you are benchmarking the code, not perforamnce differences in the code you are measuring. When the code you are measuring is time consuming enough to outweigh the benchmarking framework significantly, such errors don't invalidate your results. — Servy, Dec 12 '13 at 22:00
`sum += ((Derived)trade).Size;` -> that's one less v-table lookup (or however else managed code calls it), it should be faster. — Remus Rusanu, Dec 12 '13 at 22:01
@RemusRusanu But it also needs to do a type check, so that it can throw an exception if it's invalid. — Servy, Dec 12 '13 at 22:02
Just out of curiosity, do you see the same time discrepancy when you test the polymorphic first, and the generic second (switch the two for loops being tested)? — hatchet - done with SOverflow, Dec 12 '13 at 22:03
I read it originally here http://stackoverflow.com/a/117321/1295900 . Admittedly I can't find it at MS, but apparently Tony Northrup - co-author of MCTS 70-536 mentions this in his book. — Frank Longo, Dec 12 '13 at 22:07
The results of this test are all over the place for me. Release or Debug with or without optimizations.. what is quicker keeps switching. I think the test is flawed. — Simon Whitehead, Dec 12 '13 at 22:10
@SimonWhitehead That behavior would indicate that the two are virtually identical in effectiveness, and you are only observing errors and imperfections in the testing mechanism, which is exactly what I would expect to be the case here. — Servy, Dec 12 '13 at 22:18
As suggested, I ran the tests in different order and increased N by a factor of 10. Depending on the order, the generic version is sometimes faster than the polymorphic version. However, the polymorphic version with the cast is consistently and significantly faster regardless of the order. As Servy suggested, measuring the performance is not so simple. — Frank Longo, Dec 12 '13 at 22:26
@FrankLongo The answer you linked to states, that there is no proof that generics are faster for reference types: _I haven't been able to reproduce such performance benefits with generics compared to casting between reference types_ — MarcinJuraszek, Dec 12 '13 at 22:31
I am unfamiliar with this claim and it seems bizarre to test a claim that you can't clearly identify. My suspicion is that somewhere back in history the actual claim was misremembered or mistranscribed. Perhaps the original claim was, for example, that if you have `struct S{public T item;}` then accessing members of an array `S[]` is faster than accessing an array of `C[]` if C is an unsealed class type. This is true; the jitter can eliminate the check for unsafe array covariance if the element type is a struct. — Eric Lippert, Dec 13 '13 at 00:35

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

Execute the test under .NET 4.5 x64 with Ctrl-F5 (without debugger). Also with N increased by 10x. That way the results reliably reproduce, no matter what order the tests are in.

With generics on ref types you still get the same vtable/interface lookup because there's just one compiled method for all ref types. There's no specialization for Derived. Performance of executing the callvirt should be the same based on this.

Furthermore, generic methods have a hidden method argument that is typeof(T) (because this allows you to actually write typeof(T) in generic code!). This is additional overhead explaining why the generic version is slower.

Why is the cast faster than the interface call? The cast is just a pointer compare and a perfectly predictable branch. After the cast the concrete type of the object is known, allowing for a faster call.

if (trade.GetType() != typeof(Derived)) throw;
Derived.Size(trade); //calling directly the concrete method, potentially inlining it

All of this is educated guessing. Validate by looking at the disassembly.

If you add the cast you get the following assembly:

enter image description here

My assembly skills are not enough to fully decode this. However:

16 loads the vtable ptr of Derived
22 and #25 are the branch to test the vtable. This completes the cast.
at #32 the cast is done. Note, that following this point there's no call. Size was inlined.
35 a lea implements the add
39 store back to this.sum

The same trick works with the generic version (((Derived)(Base)o).Size).

What I suspected but didn't validate. The call is inlined completely. You appear to be running on x64 as well though.. that could also be why we're all seeing differences.. given that the x86 and x64 jitters aren't the same. — Simon Whitehead, Dec 12 '13 at 22:26

Nick Bray · Answer 2 · 2013-12-12T22:22:10.527

I believe Servy was correct it is a problem with your test. I reversed the order of the tests (just a hunch):

internal class Program
{
    public interface Base
    {
        Int64 Size { get; }
    }

    public class Derived : Base
    {
        public Int64 Size
        {
            get
            {
                return 10;
            }
        }
    }

    public class GenericProcessor<TT>
        where TT : Base
    {
        private Int64 sum;

        public GenericProcessor()
        {
            sum = 0;
        }

        public void process(TT o)
        {
            sum += o.Size;
        }

        public Int64 Sum
        {
            get
            {
                return sum;
            }
        }
    }

    public class PolymorphicProcessor
    {
        private Int64 sum;

        public PolymorphicProcessor()
        {
            sum = 0;
        }

        public void process(Base o)
        {
            sum += o.Size;
        }

        public Int64 Sum
        {
            get
            {
                return sum;
            }
        }
    }

    private static void Main(string[] args)
    {
        var generic_processor = new GenericProcessor<Derived>();
        var polymorphic_processor = new PolymorphicProcessor();
        Stopwatch sw = new Stopwatch();
        int N = 100000000;
        var derived = new Derived();
        sw.Start();
        for (int i = 0; i < N; ++i) polymorphic_processor.process(derived);
        sw.Stop();
        Console.WriteLine(
            "Sum =" + polymorphic_processor.Sum + " Poly performance = " + sw.ElapsedMilliseconds + " millisec");


        sw.Restart();
        sw.Start();
        for (int i = 0; i < N; ++i) generic_processor.process(derived);
        sw.Stop();
        Console.WriteLine(
            "Sum =" + generic_processor.Sum + " Generic performance = " + sw.ElapsedMilliseconds + " millisec");

        Console.Read();
    }
    }

In this case the polymorphic is slower in my tests. This shows that the first test is significantly slower than the second test. It could be loading classes the first time, preemptions, who knows ...

I just want to note that I am not arguing that generics are faster or as fast. I'm simply trying to prove that these kinds of tests don't make a case one way or the other.

Increasing the number of iterations by 10x generics are always slower for me on .NET 4.5, x64. — usr, Dec 12 '13 at 22:17
I increased by 10 times just now and got the same results that the polymorphic is slower. I'm not trying to argue that generics are faster. I'm just making a case for the fact that this test does not prove that polymorphic is faster. — Nick Bray, Dec 12 '13 at 22:20

c# Generics - unexpected performance results

2 Answers2

16 loads the vtable ptr of Derived

22 and #25 are the branch to test the vtable. This completes the cast.

35 a `lea` implements the add

39 store back to `this.sum`

c# Generics - unexpected performance results

2 Answers2

16 loads the vtable ptr of Derived

22 and #25 are the branch to test the vtable. This completes the cast.

35 a lea implements the add

39 store back to this.sum

35 a `lea` implements the add

39 store back to `this.sum`