My C# code generator spits out nested switch statements into some method in a class which I compile dynamically at runtime, load and instantiate, and then execute. Execution time of this is up to 100x faster when compared to the generic, non-compiled version which has to use hash tables (as the hash table keys, which turn into switch cases in the compiled version, are only known at runtime).
As the switch statements get bigger, performance stays pretty much the same, if the number of "switch hops" that are actually executed, do not change, i.e. adding code in case statements that do not get executed does not affect performance.
However, this works up until a certain code size, and then suddenly performance drops by a factor of 7 (when running in 32 bit mode) or 12 (running in native 64 bit mode).
I had a look at the JITted code, and it does in fact change for parts of the code that are not changed, as the code grows. (Not being familiar with assembly and instruction sets,) I assume there is something like a "short jump" and "long jump", the former being limited by the amount of bytes it can jump. Could someone elucidate to the high-level programmer why the generated machine code has to be, or is, different?
N.B. I'm aware that I'm testing code that is doing almost nothing, so smallest differences in the machine code naturally have a huge impact on relative performance. But the point of all this is to generate code that does as close to nothing as possible, as it is called hundreds of thousands of times per second.
Here are two different versions of the switch statement head when overall code size is relatively small and performance good, as copied from Visual Studio using an JIT optimized Release build, running in 32-bit mode:
switch (a)
00000000 push ebp
00000001 mov ebp,esp
00000003 dec edx
00000004 cmp edx,3Bh
00000007 jae 0000021D
0000000d jmp dword ptr [edx*4+00773AD8h]
{
case 1: return 1;
And, with slightly more code in the un-entered case blocks - but still as fast:
switch (a)
00000000 push ebp
00000001 mov ebp,esp
00000003 lea eax,[edx-1]
00000006 cmp eax,3Bh
00000009 jae 00001C51
0000000f jmp dword ptr [eax*4+00A35830h]
{
case 1:
{
And this is the version for the much bigger code, which turns out to be 7 times slower.
switch (a)
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 sub esp,0FCh
0000000b mov esi,ecx
0000000d lea edi,[ebp+FFFFFEFCh]
00000013 mov ecx,3Eh
00000018 xor eax,eax
0000001a rep stos dword ptr es:[edi]
0000001c mov ecx,esi
0000001e mov dword ptr [ebp-0Ch],edx
00000021 mov eax,dword ptr [ebp-0Ch]
00000024 mov dword ptr [ebp-10h],eax
00000027 mov eax,dword ptr [ebp-10h]
0000002a dec eax
0000002b cmp eax,3Bh
0000002e jae 00000037
00000030 jmp dword ptr [eax*4+0077C488h]
00000037 jmp 0000888F
{
case 1:
{
N.B. I'm only posting the head of the switch statement, as that is the only thing that gets executed in my test, because I always call the method in question with a value that is in no case statement (and there's no default case), so it will just fall through and (I hope) not execute any code inside the switch.