Branch elimination in the .NET compiler versus the Jit'ter

Question

For one of my projects I'm using a lot of branches. Think:

if (foo) { do something } 
else if (bar) { do something } 
else if (bar2) { do something } 
... and so on

and

if (foo) { do something } 
if (bar) { do something } 
if (bar2) { do something } 
... and so on

What I've been wondering is if it makes sense to do sub-expression and/or logic eliminations to speed it up. For the sake of completeness, you can assume all this is in a single function. Say, if foo and bar have a sub-expression in common, you could write something like:

if (commonSubExpr) 
{
    if (foo without commonSubExpr) { do something }
    if (bar without commonSubExpr) { do something } 
}
if (bar2) { do something } 
... and so on

Similarly, there are a lot of simple boolean logic rules that you can apply to optimize the rules.

My question is: does it make sense to do this at all? Or can I expect the JIT'ter to take care of this?

(According to the excellent article by Eric Lippert the compiler doesn't optimize this with the exception of constant folding - I assume this is still the case).

update+1

Okay, I shouldn't have asked if it makes sense, because now I get nice people trying to explain to me what premature optimizations are, which was not what I was after... my mistake. Just assume I know about over-designing, premature optimization, etc. please -- that's exactly what I'm attempting to avoid here.

So attempt 2... I want to know How Things Work and what I can expect from the compiler / JIT'ter.

I also noticed that some context might help here, so here's a little about the application:

In this case the application is a domain specific language that's compiled at run-time using Reflection.Emit to IL. There are good reasons I cannot use an existing language or an existing compiler. Performance is critical and after compilation a lot of operations are executed (which is why it's compiled to IL in the first place instead of simply interpreting the code).

The reason I'm asking this is because I want to know to what extend I should design the optimizer in the compiler. If the JIT'ter takes care of sub-expression elimination, I'll design the optimizer to do only basic things like constant folding, if .NET expects this to happen in the compiler, I'll design it in the compiler. Depending on what I can expect, the optimizer will have a completely different design. Since branches will probably be the most important performance drainers and because the implementation has a huge impact on my software design, I specifically decided to ask about this.

I know of no way to test this before implementing the compiler - which is quite a bit of work - so I wanted to get my basics straight before I started implementing. The reason I don't know how to test this is because I don't know in what cases the JIT'ter optimizes which pieces of code; I expect certain triggers in the .NET runtime that will result in certain optimizations (making test results unreliable)... If you know a way around this, please let me know.

The expressions foo, bar, etc. can be of any form you usually see in code, but you can assume it's a single function. So, it can be of the form if (StartDate < EndDate), but cannot be something like if (method1() < method2()). To explain: in the latter case the compiler cannot simply make the assumption about the return values of the methods (you need to have information about the return values before you can optimize this), so sub-expression elimination is not trivial at all.

So, as an example of sub-expression elimination:

if (int1 < int2 && int1 < int3) {
    //...
}
else if (int1 < int2 && int1 < int3) {
    //...
}

can be rewritten to:

if (int1 < int2)
{
    if (int1 < int3) {
        //...
    }
    else if (int1 < int3) {
        //...
    }
}

So to conclude: What I want to know is if these kinds of sub-expression elimination optimizations make sense or not - or if they are handled by the JIT'ter.

@MatthewWatson I've considered that, but am not sure if that can be reliably measured in this case. I always see the Jitter as an unpredictable component. — atlaste, Sep 13 '13 at 10:26
Well if you can't measure any difference, it's a pointless optimization. :) — Matthew Watson, Sep 13 '13 at 10:33
@MatthewWatson Well that's certainly true in a lot of cases. That said, I want to know what I can expect from the JIT'ter so that I can decide in complex cases to optimize it for myself or trust the JIT'ter to do it for me. F.ex. I use a lot of Reflection.Emit - if I know that the JIT'ter will do certain kinds of optimizations, the emit code can be designed much easier. — atlaste, Sep 13 '13 at 10:38
Trust the JIT to do the right thing, unless you have positive proof that it needs to be quicker and that it can be quicker. — thecoop, Sep 13 '13 at 10:47
@thecoop Normally I do; in this case I'd like to know before I design my code. Imagine building a project like http://jurassic.codeplex.com/ - if you figure this out with tests after you've implemented the IL generator, you need to redesign a large portion of the code. Knowing what to trust is in this case better, so you can incorporate the important things in the design instead of having this nasty surprise afterwards. — atlaste, Sep 13 '13 at 10:55
_I'd like to know before I design my code_ might not be a good idea. There are 2 compilers and a pipelined CPU between your code and the actual calculations. Micro optimization tends to be hardware dependent. Common sense still holds: make it work first, then find the bottlenecks. — H H, Sep 13 '13 at 14:13
@HenkHolterman The pipelined CPU is described perfectly - in this case in the Intel books I have lying around downstairs. The first compiler is the one I'm writing myself. The only component missing is the second compiler, that I'm asking about. I had this discussion many, many times - there are a lot of people that just don't care anymore how things work, and there are a few people that care about all the nitty gritty tiny details. I care and use the resulting knowledge pragmatically -- if you're talking about "common sense" I strongly believe having knowledge about what you're doing helps. — atlaste, Sep 13 '13 at 15:01
Those tiny details are undocumented implementation details subject to change. — Eric Lippert, Sep 13 '13 at 21:42
@EricLippert Sure, if it was documented, I wouldn't ask. And if it changes, I see it as an opportunity to learn new things (in some cases the hard way). I see gathering knowledge as a good thing; while using that knowledge is a more complex story. — atlaste, Sep 14 '13 at 07:46

score 5 · Accepted Answer · edited May 23 '17 at 11:57

So, it can be of the form if (StartDate < EndDate)

No, it can't. Your compiler needs to generate a call to the DateTime.op_LessThan() method. The trouble with generating calls to methods is that you cannot know with 100% certainty that the method will not have an observable side-effect. DateTime.op_LessThan does not but that's not something your compiler can find out by itself. You would have to hard-code that rule in your compiler.

The jitter can however, it does know what the code for that method looks like. And it is very small, it will inline the method to a single CPU instruction. Which executes in less than one CPU cycle on average. The branch prediction logic built into the processor ensures that the branch isn't likely to stall the pipeline.

Pretty hard to make the optimizer in your compiler pay off. It can only eliminate common sub-expressions for very simple code without side-effects, but such code already runs very fast. The C# compiler is a good model to follow, it doesn't optimize and leaves the job up the jitter. Optimizations performed by the jitter are described in this answer. And yes, common sub-expression elimination is one such optimization it knows how to perform.

Whether or not it applies the optimization is however unpredictable, it depends on what other code it needs to generate in the method. I haven't checked this specific case, I rather doubt it will due to the branching. There more of it than meets the eye if you also want to provide short-circuit evaluation for the && and || operators. The only way you can find out is looking at the actual generated machine code. Pick the jitter you want to verify with the Platform target setting. Build the Release configuration of your test code. And Tools + Options, Debugging, General, untick the "Suppress JIT optimization" option. And look at the machine code with a breakpoint and Debug + Windows + Disassembly. Watch out for fake test code, it often runs unrealistically fast if the jitter can optimize too much. And watch out for burning way too much time on this :)

Hans, thanks for writing this down, this really helps (especially the link)! I was wondering how you would put a breakpoint in the emitted IL code though. I mean, sure i can 'step into', but there's quite a bit emitted. Do you have any suggestions about that? PS: you assumed (in my case incorrectly) that I store dates as DateTime- it's actually a `long` or `int` depending on the context; not that it matters in this case though. — atlaste, Sep 13 '13 at 14:51
Use C# code to see what the jitter does. You can just set a breakpoint on the C# code itself. And you set breakpoints in the disassembly window. — Hans Passant, Sep 13 '13 at 15:46
Okay, conclusion for now: JIT'ter takes care of the heavy lifting, design compiler simple (e.g. follow the post of Eric Lippert about what C# does), optimize only the things that the compiler/JIT'ter cannot know about (probably domain-specific). This was what I was hoping for; nice. — atlaste, Sep 14 '13 at 07:56

score 3 · Answer 2 · edited May 23 '17 at 11:57

It seems that you're going in wrong direction as many people in comments already noted.

I would worry more about manageability and clarity of this code if you really intend to grow it significantly.

If your foo, bar, bar2, commonSubExpr are just booleans this would probably be fastest part of your application regardless of method 1 or 2.

If foo, bar, bar2, commonSubExpr are evaluation functions that might be expensive, then you should optimize functions themselves, perhaps cache results if possible. But in that case it has nothing to do with composition and structure of if/else clauses.

UPDATE:

If you have code like this:

class Program
{
    static void Main(string[] args)
    {
        var int1 = 1;
        var int2 = 2;
        var int3 = 3;


        if (int1 < int2 && int1 < int3) {
            Console.WriteLine("Branch 1");
        }
        else if (int1 < int2 && int1 < int3) {
            Console.WriteLine("Branch 2");
        }
    }
}

Optimized MSIL will look like this:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       44 (0x2c)
  .maxstack  2
  .locals init ([0] int32 int1,
           [1] int32 int2,
           [2] int32 int3)
  IL_0000:  ldc.i4.1
  IL_0001:  stloc.0
  IL_0002:  ldc.i4.2
  IL_0003:  stloc.1
  IL_0004:  ldc.i4.3
  IL_0005:  stloc.2
  IL_0006:  ldloc.0
  IL_0007:  ldloc.1
  IL_0008:  bge.s      IL_0019
  IL_000a:  ldloc.0
  IL_000b:  ldloc.2
  IL_000c:  bge.s      IL_0019
  IL_000e:  ldstr      "Branch 1"
  IL_0013:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_0018:  ret
  IL_0019:  ldloc.0
  IL_001a:  ldloc.1
  IL_001b:  bge.s      IL_002b
  IL_001d:  ldloc.0
  IL_001e:  ldloc.2
  IL_001f:  bge.s      IL_002b
  IL_0021:  ldstr      "Branch 2"
  IL_0026:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_002b:  ret
} // end of method Program::Main

However, 2nd example:

static void Main(string[] args)
{
    var int1 = 1;
    var int2 = 2;
    var int3 = 3;

    if (int1 < int2)
    {
        if (int1 < int3)
        {
            Console.WriteLine("Branch 1");
        }
        else if (int1 < int3)
        {
            Console.WriteLine("Branch 2");
        }
    }
}

will produce 3 lines of code less:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       40 (0x28)
  .maxstack  2
  .locals init ([0] int32 int1,
           [1] int32 int2,
           [2] int32 int3)
  IL_0000:  ldc.i4.1
  IL_0001:  stloc.0
  IL_0002:  ldc.i4.2
  IL_0003:  stloc.1
  IL_0004:  ldc.i4.3
  IL_0005:  stloc.2
  IL_0006:  ldloc.0
  IL_0007:  ldloc.1
  IL_0008:  bge.s      IL_0027
  IL_000a:  ldloc.0
  IL_000b:  ldloc.2
  IL_000c:  bge.s      IL_0019
  IL_000e:  ldstr      "Branch 1"
  IL_0013:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_0018:  ret
  IL_0019:  ldloc.0
  IL_001a:  ldloc.2
  IL_001b:  bge.s      IL_0027
  IL_001d:  ldstr      "Branch 2"
  IL_0022:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_0027:  ret
} // end of method Program::Main

Roughly, difference is 3 instructions:

  IL_001a:  ldloc.1
  IL_001b:  bge.s      IL_002b
  IL_001d:  ldloc.0

According to other sources (read here), JIT does not do this type of optimizaiton, but even if it would, this is not measurable by any extent.

thanks for your answer. I've posted an update to my question. As for your evaluation functions answer- I've also added a bit about that: in that case sub-expression elimination is non-trivial because functions can modify state (so the elimination result is different from the original expression). As for the 'just booleans' - the expression depends on operations and variables that eventually evaluate to booleans... if it doesn't evaluate to a boolean, it of course doesn't compile. — atlaste, Sep 13 '13 at 12:20
Thanks for the answer. (Have you compiled in release mode?) Unfortunately extra branches is exactly what I was afraid of- branch prediction failures can significantly slow your application down if the variables that are used are more or less unpredictable (and in my case they are). That said, Hans argued that the JIT'ter does optimize it in this case... I'll read up on the links you provided, thanks. — atlaste, Sep 13 '13 at 15:05
Yes, it was release mode. If your variables/functions are non-deterministic, then case 1 and case 2 are not guaranteed to bring same result. That's exactly reason why compiler should not optimize that case. — Nenad, Sep 13 '13 at 16:54
http://stackoverflow.com/a/7376853/186822 - I still have feeling that you somehow mixed topics. Flowgraphs, inlining and branch prediction. — Nenad, Sep 13 '13 at 18:06
Hmmhm... I'm interested in sub-expression elimination -because- it has an effect on branch prediction. Simply put, I make it a habit of eliminating branches if the data is unpredictable. Inlining is not really the topic here imo. I've read your links and it's mostly about inlining; For now Hans has given me the answer that best describes what I was looking for (no worries, I'll click some points your way too :-). — atlaste, Sep 14 '13 at 07:52

Branch elimination in the .NET compiler versus the Jit'ter

2 Answers2