Back in 2009 I posted this answer to a question about optimisations for nested try
/catch
/finally
blocks.
Thinking about this again some years later, it seems the question could be extended to that other control flow, not only try
/catch
/finally
, but also if
/else
.
At each of these junctions, execution will follow one path. Code must be generated for both, obviously, but the order in which they're placed in memory, and the number of jumps required to navigate through them will differ.
The order generated code is laid out in memory has implications for the miss rate on the CPU's instruction cache. Having the instruction pipeline stalled, waiting for memory reads, can really kill loop performance.
I don't think loops (for
/foreach
/while
) are a such a good fit unless you expect the loop have zero iterations more often than it has some, as the natural generation order seems pretty optimal.
Some questions:
- In what ways do the available .NET JITs optimise for generated instruction order?
- How much difference can this make in practice to common code? What about perfectly suited cases?
- Is there anything the developer can do to influence this layout? What about mangling with the forbidden
goto
? - Does the specific JIT being used make much difference to layout?
- Does the method inlining heuristic come into play here too?
- Basically anything interesting related to this aspect of the JIT!
Some initial thoughts:
Moving catch
blocks out of line is an easy job, as they're supposed to be the exceptional case by definition. Not sure this happens.
For some loops I suspect you can increase performance non-trivially. However in general I don't think it'll make that much difference.
I don't know how the JIT decides the order of generated code. In C on Linux you have likely(cond)
and unlikely(cond)
which you can use to tell to the compiler which branch is the common path to optimise for. I'm not sure that all compilers respect these macros.
Instruction ordering is distinct from the problem of branch prediction, in which the CPU guesses (on its own, afaik) which branch will be taken in order to start the pipeline (oversimplied steps: decode, fetch operands, execute, write back) on instructions, before the execute step has determined the value of the condition variable.
I can't think of any way to influence this order in the C# language. Perhaps you can manipulate it a bit by goto
ing to labels explicitly, but is this portable, and are there any other problems with it?
Perhaps this is what profile guided optimisation is for. Do we have that in the .NET ecosystem, now or in plan? Maybe I'll go and have a read about LLILC.