16

I've been optimising/benchmarking some code recently and came across this method:

public void SomeMethod(Type messageType)
{
    if (messageType == typeof(BroadcastMessage))
    {
        // ...
    }
    else if (messageType == typeof(DirectMessage))
    {
        // ...
    }
    else if (messageType == typeof(ClientListRequest))
    {
        // ...
    }
}

This is called from a performance critical loop elsewhere, so I naturally assumed all those typeof(...) calls were adding unnecessary overhead (a micro-optimisation, I know) and could be moved to private fields within the class. (I'm aware there are better ways to refactor this code, however, I'd still like to know what's going on here.)

According to my benchmark this isn't the case at all (using BenchmarkDotNet).

[DisassemblyDiagnoser(printAsm: true, printSource: true)]
[RyuJitX64Job]
public class Tests
{
    private Type a = typeof(string);
    private Type b = typeof(int);

    [Benchmark]
    public bool F1()
    {
        return a == typeof(int);
    }

    [Benchmark]
    public bool F2()
    {
        return a == b;
    }
}

Results on my machine (Window 10 x64, .NET 4.7.2, RyuJIT, Release build):

The functions compiled down to ASM:

F1

mov     rcx,offset mscorlib_ni+0x729e10
call    clr!InstallCustomModule+0x2320
mov     rcx,qword ptr [rsp+30h]
cmp     qword ptr [rcx+8],rax
sete    al
movzx   eax,al

F2

mov     qword ptr [rsp+30h],rcx
mov     rcx,qword ptr [rcx+8]
mov     rdx,qword ptr [rsp+30h]
mov     rdx,qword ptr [rdx+10h]
call    System.Type.op_Equality(System.Type, System.Type)
movzx   eax,al

I don't know how to interpret ASM so am unable to understand the significance of what's happening here. In a nut shell, why is F1 faster?

Sam
  • 7,252
  • 16
  • 46
  • 65
  • Is `typeof(int)` calculated at runtime, or at compile time? – mjwills Feb 25 '19 at 20:48
  • 8
    https://blogs.msdn.microsoft.com/vancem/2006/10/01/drilling-into-net-runtime-microbenchmarks-typeof-optimizations/ talks a little about typeof optimisations. I suspect `The answer is that the JIT recognises this sequence and knows that while it seems like two System.Type object need to be created and compared, all you really want is a yes-no answer on a type question that can be answered using RuntimeTypeHandles. It thus substitutes that code` may be involved, although it is hard to say. – mjwills Feb 25 '19 at 20:53
  • Same IL in both worlds? – Clay Feb 25 '19 at 21:02
  • @Clay I believe so, the only difference I can see is that `[mscorlib]` is used in place of `[System.Runtime]` when running on .NET Framework instead of .NET Core. – Sam Feb 25 '19 at 21:07
  • 5
    The IL is completely irrelevant; what you want to be looking at is the jitted machine code. – Eric Lippert Feb 26 '19 at 07:29
  • @EricLippert Ah, I guess that explains the difference in speed between the two frameworks given the same IL. – Sam Feb 26 '19 at 11:52
  • 2
    @EricLippert After digging a little further, it turns out I was accidentally comparing the output of RyuJIT to the old X86 JIT which caused the large speed difference between the two frameworks. By default, Visual Studio 2017 enables the "Prefer 32-bit" option for release builds that target the .NET Framework, but doesn't for .NET Core release builds. Disabling this option causes the RyuJIT compiler to be used instead, resulting in almost identical performance between the frameworks. However, the `a == typeof(int)` comparison still remains faster. – Sam Feb 26 '19 at 14:18

2 Answers2

14

The assembly you posted shows that the comment of mjwills is, as expected, correct. As the linked article notes, the jitter can be smart about certain comparisons, and this is one of them.

Let's look at your first fragment:

mov     rcx,offset mscorlib_ni+0x729e10

rcx is the "this pointer" of a call to a member function. The "this pointer" in this case will be the address of some CLR pre-allocated object, what exactly I do not know.

call    clr!InstallCustomModule+0x2320

Now we call some member function on that object; I don't know what. The nearest public function that you have debug info for is InstallCustomModule, but plainly we are not calling InstallCustomModule here; we're calling the function that is 0x2320 bytes away from InstallCustomModule.

It would be interesting to see what the code at InstallCustomModule+0x2320 does.

Anyways, we make the call, and the return value goes in rax. Moving on:

mov     rcx,qword ptr [rsp+30h]
cmp     qword ptr [rcx+8],rax

This looks like it is fetching the value of a out of this and comparing it to whatever the function returned.

The rest of the code is just perfectly ordinary: moving the bool result of the comparison into the return register.

In short, the first fragment is equivalent to:

return ReferenceEquals(SomeConstantObject.SomeUnknownFunction(), this.a);

Obviously an educated guess here is that the constant object and the unknown function are special-purpose helpers that rapidly fetch commonly-used type objects like typeof(int).

A second educated guess is that the jitter is deciding for itself that the pattern "compare a field of type Type to a typeof(something)" can best be made as a direct reference comparison between objects.

And now you can see for yourself what the second fragment does. It is just:

return Type.op_Equality(this.a, this.b);

All it does is call a helper method that compares two types for value equality. Remember, the CLR does not guarantee reference equality for all equivalent type objects.

Now it should be clear why the first fragment is faster. The jitter knows hugely more about the first fragment. It knows, for instance, that typeof(int) will always return the same reference, and so you can do a cheap reference comparison. It knows that typeof(int) is never null. It knows the exact type of typeof(int) -- remember, Type is not sealed; you can make your own Type objects.

In the second fragment, the jitter knows nothing other than it has two operands of type Type. It doesn't know their runtime types, it doesn't know their nullity; for all it knows, you subclassed Type yourself and made up two instances that are reference-unequal but value-equal. It has to fall back to the most conservative position and call a helper method that starts going down the list: are they both null? Is one of the null and the other non-null? are they reference equal? And so on.

It looks like lacking that knowledge is costing you the enormous penalty of... half a nanosecond. I wouldn't worry about it.

mjwills
  • 23,389
  • 6
  • 40
  • 63
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Wow, thanks for taking the time to break this down nicely; much appreciated. So it's not so much about accessing another variable that slows down F2, it's the way the equality is checked, cool. Yeah, I know it's the tiniest of micro-optimisations, I was just curious. :) – Sam Feb 26 '19 at 16:13
  • @EricLippert, can you see this question https://stackoverflow.com/questions/54907236/why-teventargs-wasnt-made-contravariant-in-standard-event-pattern-in-the-net-e ? – Zack ISSOIR Feb 27 '19 at 14:58
2

If you are curious, you can also look at the logic the jit uses, see gtFoldTypeCompare.

There are a whole bunch of things the jit can do to simplify or even eliminate type comparisons. They all require knowing something about the creation of the types being compared.

Andy Ayers
  • 892
  • 6
  • 13