Which code-flow pattern is more efficient in C#/.NET?

Question

Consider the situation in which the main logic of a method should only actually run given a certain condition. As far as I know, there are two basic ways to achieve this:

If inverse condition is true, simply return:

public void aMethod(){
    if(!aBoolean) return;
    // rest of method code goes here
}

or

If original condition is true, continue execution:

public void aMethod(){
    if(aBoolean){
        // rest of method code goes here
    }
}

Now, I would guess that which of these implementations is more efficient is dependent on the language its written in and/or how if statements and return statements, and possibly method calls, are implemented by the compiler/interpreter/VM (depending on language); so the first part of my question is, is this true?

The second part of my question is, if the the answer to the first part is "yes", which of the above code-flow patterns is more efficient specifically in C#/.NET 4.6.x?

Edit: In reference to Dark Falcon's comment: the purpose of this question is not actually to fix performance issues or optimize any real code I've written, I am just curious about how each piece of each pattern is implemented by the compiler, e.g. for arguments sake, if it was compiled verbatim with no compiler optimizations, which would be more efficient?

Neither. They will most likely optimize to exactly the same thing. Fix performance problems by profiling to find them, not by micro-optimizing things which will not matter. As a matter of preference, I would choose the first as long as there is little or no code before the `if`. — Dark Falcon, Dec 29 '16 at 15:21
Well yeah, I know that. I'm not asking this question to actually optimize my code, I mostly am just curious about how each piece is implemented by the compiler, e.g. for arguments sake, if it was compiled verbatim with no compiler optimizations, which would be more efficient? — Mat Jones, Dec 29 '16 at 15:23
The point is that this compiles to IL, which has no particular concept of blocks of code. Flow control is done with jump instructions (basically like a `goto`). Most likely even just the process of converting to IL will result in the exact same IL code. — Dark Falcon, Dec 29 '16 at 15:25
Okay then for arguments sake pretend I'm asking the question for C, not C# (no IL, compiled directly to Assembly instructions). Does this change anything? — Mat Jones, Dec 29 '16 at 15:27
I think the first one..it reduce the nesting .. as Resharper suggest to you .. — federico scamuzzi, Dec 29 '16 at 15:29
@federicoscamuzzi Resharper only suggests this for readability reasons, not performance. — Mr47, Dec 29 '16 at 15:30
yes sure... it's just only for that .. no performance difference between the 2 pattern — federico scamuzzi, Dec 29 '16 at 15:31
@DarkFalcon I would've thought they would be optimised to exactly the same thing as well, but at least at the CIL level, they don't. You can try for yourself on http://tryroslyn.azurewebsites.net/. They're even different in Release mode. (Now, I do not expect this to make any difference performance-wise, but I haven't measured it.) — , Dec 29 '16 at 15:32
@CodyGray you're the first one to actually answer the question I was trying to ask! Thank you! — Mat Jones, Dec 29 '16 at 16:00

score 12 · Accepted Answer · edited May 23 '17 at 12:24

TL;DR It doesn't make a difference. Current generations of processors (circa Ivy Bridge and later) don't use a static branch-prediction algorithm that you can reason about anymore, so there is no possible performance gain in using one form or the other.

On most older processors, the static branch-prediction strategy is generally that forward conditional jumps are assumed to be taken, while backwards conditional jumps are assumed not-taken. Therefore, there might be a small performance advantage to be gained the first time the code is executed by arranging for the fall-through case to be the most likely—i.e.,
if { expected } else { unexpected }.

But the fact is, this kind of low-level performance analysis makes very little sense when writing in a managed, JIT-compiled language like C#.

You're getting a lot of answers that say readability and maintainability should be your primary concern when writing code. This is regrettably common with "performance" questions, and while it is completely true and unarguable, it mostly skirts the question instead of answering it.

Moreover, it isn't clear why form "A" would be intrinsically more readable than form "B", or vice versa. There are just as many arguments one way or the other—do all parameter validation at the top of the function, or ensure there is only a single return point—and it ultimately gets down to doing what your style guide says, except in really egregious cases where you'd have to contort the code in all sorts of terrible ways, and then you should obviously do what is most readable.

Beyond being a completely reasonable question to ask on conceptual/theoretical grounds, understanding the performance implications also seems like an excellent way to make an informed decision about which general form to adopt when writing your style guide.

The remainder of the existing answers consist of misguided speculation, or downright incorrect information. Of course, that makes sense. Branch prediction is complicated, and as processors get smarter, it only gets harder to understand what is actually happening (or going to happen) under the hood.

First, let's get a couple of things straight. You make reference in the question to analyzing the performance of unoptimized code. No, you don't ever want to do that. It is a waste of time; you'll get meaningless data that does not reflect real-world usage, and then you'll try and draw conclusions from that data, which will end up being wrong (or maybe right, but for the wrong reasons, which is just as bad). Unless you're shipping unoptimized code to your clients (which you shouldn't be doing), then you don't care how unoptimized code performs. When writing in C#, there are effectively two levels of optimization. The first is performed by the C# compiler when it is generating the intermediate language (IL). This is controlled by the optimization switch in the project settings. The second level of optimization is performed by the JIT compiler when it translates the IL into machine code. This is a separate setting, and you can actually analyze the JITed machine code with optimization enabled or disabled. When you're profiling or benchmarking, or even analyzing the generated machine code, you need to have both levels of optimizations enabled.

But benchmarking optimized code is difficult, because the optimization often interferes with the thing you're trying to test. If you tried to benchmark code like that shown in the question, an optimizing compiler would likely notice that neither one of them is actually doing anything useful and transform them into no-ops. One no-op is equally fast as another no-op—or maybe it's not, and that's actually worse, because then all you're benchmarking is noise that has nothing to do with performance.

The best way to go here is to actually understand, on a conceptual level, how the code is going to be transformed by a compiler into machine code. Not only does that allow you to escape the difficulties of creating a good benchmark, but it also has value above and beyond the numbers. A decent programmer knows how to write code that produces correct results; a good programmer knows what is happening under the hood (and then makes an informed decision about whether or not they need to care).

There has been some speculation about whether the compiler will transform form "A" and form "B" into equivalent code. It turns out that the answer is complicated. The IL will almost certainly be different because it will be a more or less literal translation of the C# code that you actually write, regardless of whether or not optimizations are enabled. But it turns out that you really don't care about that, because IL isn't executed directly. It's only executed after the JIT compiler gets done with it, and the JIT compiler will apply its own set of optimizations. The exact optimizations depend on exactly what type of code you've written. If you have:

int A1(bool condition)
{
    if (condition)    return 42;
    return 0;
}

int A2(bool condition)
{
    if (!condition)   return 0;
    return 42;
}

it is very likely that the optimized machine code will be the same. In fact, even something like:

void B1(bool condition)
{
    if (condition)
    {
        DoComplicatedThingA();
        DoComplicatedThingB();
    }
    else
    {
        throw new InvalidArgumentException();
    }
}

void B2(bool condition)
{
    if (!condition)
    {
        throw new InvalidArgumentException();
    }
    DoComplicatedThingA();
    DoComplicatedThingB();
}

will be treated as equivalent in the hands of a sufficiently capable optimizer. It is easy to see why: they are equivalent. It is trivial to prove that one form can be rewritten in the other without changing the semantics or behavior, and that is precisely what an optimizer's job is.

But let's assume that they did give you different machine code, either because you wrote complicated enough code that the optimizer couldn't prove that they were equivalent, or because your optimizer was just falling down on the job (which can sometimes happen with a JIT optimizer, since it prioritizes speed of code generation over maximally efficient generated code). For expository purposes, we'll imagine that the machine code is something like the following (vastly simplified):

C1:
    cmp  condition, 0        // test the value of the bool parameter against 0 (false)
    jne  ConditionWasTrue    // if true (condition != 1), jump elsewhere;
                             //  otherwise, fall through
    call DoComplicatedStuff  // condition was false, so do some stuff
    ret                      // return
ConditionWasTrue:
    call ThrowException      // condition was true, throw an exception and never return

C2:
    cmp  condition, 0        // test the value of the bool parameter against 0 (false)
    je   ConditionWasFalse   // if false (condition == 0), jump elsewhere;
                             //  otherwise, fall through
    call DoComplicatedStuff  // condition was true, so do some stuff
    ret                      // return
ConditionWasFalse:
    call ThrowException      // condition was false, throw an exception and never return

That cmp instruction is equivalent to your if test: it checks the value of condition and determines whether it's true or false, implicitly setting some flags inside the CPU. The next instruction is a conditional branch: it branches to the specification location/label based on the values of one or more flags. In this case, je is going to jump if the "equals" flag is set, while jne is going to jump if the "equals" flag is not set. Simple enough, right? This is exactly how it works on the x86 family of processors, which is probably the CPU for which your JIT compiler is emitting code.

And now we get to the heart of the question that you're really trying to ask; namely, does it matter whether we execute a je instruction to jump if the comparison set the equal flag, or whether we execute a jne instruction to jump if the comparison did not set the equal flag? Again, unfortunately, the answer is complicated, but enlightening.

Before continuing, we need to develop some understanding of branch prediction. These conditional jumps are branches to some arbitrary section in the code. A branch can either be taken (which means the branch actually happens, and the processor begins executing code found at a completely different location), or it can be not taken (which means that execution falls through to the next instruction as if the branch instruction wasn't even there). Branch prediction is very important because mispredicted branches are very expensive on modern processors with deep pipelines that use speculative execution. If it predicts right, it continues uninterrupted; however, if it predicts wrong, it has to throw away all of the code that it speculatively executed and start over. Therefore, a common low-level optimization technique is replacing branches with clever branchless code in cases where the branch is likely to be mispredicted. A sufficiently smart optimizer would turn if (condition) { return 42; } else { return 0; } into a conditional move that didn't use a branch at all, regardless of which way you wrote the if statement, making branch prediction irrelevant. But we're imagining that this didn't happen, and you actually have code with a conditional branch—how does it get predicted?

How branch prediction works is complicated, and getting more complicated all the time as CPU vendors continue to improve the circuitry and logic inside of their processors. Improving branch prediction logic is a significant way that hardware vendors add value and speed to the things they're trying to sell, and every vendor uses different and proprietary branch-prediction mechanisms. Worse, every generation of processor uses slightly different branch-prediction mechanisms, so reasoning about it in the "general case" is exceedingly difficult. Static compilers offer options that allow you to optimize the code they generate for a particular generation of microprocessor, but this doesn't generalize well when shipping code to a large number of clients. You have little choice but to resort to a "general purpose" optimization strategy, although this usually works pretty well. The big promise of a JIT compiler is that, because it compiles the code on your machine right before you use it, it can optimize for your specific machine, just like a static compiler invoked with the perfect options. This promise hasn't exactly been reached, but I won't digress down that rabbit hole.

All modern processors have dynamic branch prediction, but how exactly they implement it is variable. Basically, they "remember" whether a particular (recent) branch was taken or not taken, and then predict that it will go this way the next time. There are all kinds of pathological cases that you can imagine here, and there are, correspondingly, all kinds of cases in or approaches to the branch-prediction logic that help to mitigate the possible damage. Unfortunately, there isn't really anything you can do yourself when writing code to mitigate this problem—except getting rid of branches entirely, which isn't even an option available to you when writing in C# or other managed languages. The optimizer will do whatever it will; you just have to cross your fingers and hope that it is the most optimal thing. In the code we're considering, then, dynamic branch prediction is basically irrelevant and we won't talk about it any more.

What is important is static branch prediction—what prediction is the processor going to make the first time it executes this code, the first time it encounters this branch, when it doesn't have any real basis on which to make a decision? There are a bunch of plausible static prediction algorithms:

Predict all branches are not taken (some early processors did, in fact, use this).
Assume "backwards" conditional branches are taken, while "forwards" conditional branches are not taken. The improvement here is that loops (which jump backwards in the execution stream) will be correctly predicted most of the time. This is the static branch-prediction strategy used by most Intel x86 processors, up to about Sandy Bridge.

Because this strategy was used for so long, the standard advice was to arrange your if statements accordingly:
```
if (condition)
{
    // most likely case
}
else
{
    // least likely case
}
```
This possibly looks counter-intuitive, but you have to go back to what the machine code looks like that this C# code will be transformed into. Compilers will generally transform the if statement into a comparison and a conditional branch into the else block. This static branch prediction algorithm will predict that branch as "not taken", since it's a forward branch. The if block will just fall through without taking the branch, which is why you want to put the "most likely" case there.

If you get into the habit of writing code this way, it might have a performance advantage on certain processors, but it's never enough of an advantage to sacrifice readability. Especially since it only matters the first time the code is executed (after that, dynamic branch prediction kicks in), and executing code for the first time is always slow in a JIT-compiled language!
Always use the dynamic predictor's result, even for never-seen branches.

This strategy is pretty strange, but it's actually what most modern Intel processors use (circa Ivy Bridge and later). Basically, even though the dynamic branch-predictor may have never seen this branch and therefore may not have any information about it, the processor still queries it and uses the prediction that it returns. You can imagine this as being equivalent to an arbitrary static-prediction algorithm.

In this case, it absolutely does not matter how you arrange the conditions of an if statement, because the initial prediction is essentially going to be random. Some 50% of the time, you'll pay the penalty of a mispredicted branch, while the other 50% of the time, you'll benefit from a correctly predicted branch. And that's only the first time—after that, the odds get even better because the dynamic predictor now has more information about the nature of the branch.

This answer has already gotten way too long, so I'll refrain from discussing static prediction hints (implemented only in the Pentium 4) and other such interesting topics, bringing our exploration of branch prediction to a close. If you're interested in more, examine the CPU vendor's technical manuals (although most of what we know has to be empirically determined), read Agner Fog's optimization guides (for x86 processors), search online for various white-papers and blog posts, and/or ask additional questions about it.

The takeaway is probably that it doesn't matter, except on processors that use a certain static branch-prediction strategy, and even there, it hardly matters when you're writing code in a JIT-compiled language like C# because the first-time compilation delay exceeds the cost of a single mispredicted branch (which may not even be mispredicted).

Very, very interesting answer. Thank you! I knew a little about branch prediction and such, but I learned a lot from your answer. +1, and marked as accepted answer. — Mat Jones, Dec 29 '16 at 17:53

score 4 · Answer 2 · 2016-12-29T16:08:34.737

Same issue when validating parameters to functions.

It's much cleaner to act like a night-club bouncer, kicking the no-hopers out as soon as possible.

  public void aMethod(SomeParam p)
  {
     if (!aBoolean || p == null) 
         return;

     // Write code in the knowledge that everything is fine
  }

Letting them in only causes trouble later on.

  public void aMethod(SomeParam p)
  {
     if (aBoolean) 
     {
         if (p != null)
         {
             // Write code, but now you're indented 
             // and other if statements will be added later
         }

         // Later on, someone else could add code here by mistake.
     }

     // or here...
  }

The C# language prioritizes safety (bug prevention) over speed. In other words, almost everything has been slowed down to prevent bugs, one way or another. If you need speed so badly that you start worrying about if statements, then perhaps a faster language would suit your purposes better, possibly C++

Compiler writers can and do make use of statistics to optimize code, for example "else clauses are only executed 30% of the time".

However, the hardware guys probably do a better job of predicting execution paths. I would guess that these days, the most effective optimizations happen within the CPU, with their L1 and L2 caches, and compiler writers don't need to do a thing.

Yeah I know that. I wasn't really asking as much about maintainability/writing "clean" code as asking about the efficiency of the underlying Assembly instructions. — Mat Jones, Dec 29 '16 at 15:47
Any decent optimizing compiler will crunch your code the same way, regardless of how you write your if statements. Don't worry about it. — , Dec 29 '16 at 15:49
see the edit to my question, and/or my first comment on the original post — Mat Jones, Dec 29 '16 at 15:53

Pablo Recalde · Answer 3 · 2016-12-29T15:54:43.353

0

As [~Dark Falcon] mentioned you should not be concerned by micro optimization of little bits of code, the compiler will most probably optimize both approaches to the same thing.

Instead you should be very concerned about your program maintainability and ease of read

From this perspective you should choose B for two reasons:

It only has one exit point (just one return)
The if block is surrounded by curly braces

edit But hey! as told in the comments that is just my opinion and what I consider good practices

edited Dec 29 '16 at 15:54

answered Dec 29 '16 at 15:33

Pablo Recalde

3,334
1
22
47

_It only has one exit point (just one return)_ - very very subjective. Imagine method with five `if` statements and one return point at the end. As good "author" of my code, I don't want to force readers read all lines if first condition is false. Based on this own return point for every failed condition will be more readable – Fabio Dec 29 '16 at 15:35
1

"the compiler will most probably optimize both approaches to the same thing" -- I had just commented on the question, less than a minute before your answer, that this is not true and can be verified online. As for the rest of your answer, that's your personal opinion, and you're entitled to it, but your opinion is not universal and other people may have good reasons to disagree. Neither your opinion nor theirs makes for a good answer, since there is no way to judge it as right or wrong. – Dec 29 '16 at 15:36
@r1verside To be perfectly honest, I think your point #2 is pedantic/very, very much just your opinion because I could also just change the `if` block in the first one to `if(!aBoolean){ return; }`, invalidating your point about the curly braces... – Mat Jones Dec 29 '16 at 15:37
@mjones.udri What i mean is that to use curly braces even for just one statement is good practice, and that is not just my opinion, It applies also for not strong typed languages like ECMAScript5 where is really dangerous. – Pablo Recalde Dec 29 '16 at 15:43
@Fabio of course that there are situations where you'll not follow that rule, but this is the same for every rule or "good practice" it applies to most of the cases with some exceptions. – Pablo Recalde Dec 29 '16 at 15:44
"to use curly braces even for just one statement is good practice" this is just your opinion; the coding standard at my job is to inline blocks that are only a single statement, like `if(!aBoolean) return;` – Mat Jones Dec 29 '16 at 15:45
@mjones.udri http://softwareengineering.stackexchange.com/questions/16528/single-statement-if-block-braces-or-no/16530 – Pablo Recalde Dec 29 '16 at 15:50
1

"and that is not just my opinion" -- This *really* comes across poorly in written form. If you emphasise "my", if you mean that others share your opinion, then sure. If you emphasise "opinion", if you mean that it's fact, then absolutely not. Based on the rest of your sentence, I cannot tell which meaning you're going for. – Dec 29 '16 at 15:51
@r1verside, people need "best practices" when they don't know how to "organize" their work. People need "same practices" when they works in the team. When you start working on something new you try to keep by "best practices" for getting work done. When you worked inough you will understand the reason of the practices and even will invent your own. So it is very very subjective. Some teams findout that writing code without tests is much faster and productive and they happy with this. Other can argue with that – Fabio Dec 29 '16 at 15:52
Answers like this one bother me. They start off making wrong claims/guesses, and then end by saying something completely correct and unarguable. Of course code should be written with maintainability as the primary factor, but that doesn't really answer the question being asked. Besides, there's no justification for why rearranging the direction of an `if`/`else` statement would really hurt readability in the general case. – Cody Gray - on strike Dec 29 '16 at 16:03

score 0 · Answer 4 · answered Dec 29 '16 at 15:50

0

I am just curious about how each piece of each pattern is implemented by the compiler, e.g. for arguments sake, if it was compiled verbatim with no compiler optimizations, which would be more efficient?

The best way to test efficiency in this way is to run benchmarks on the code samples you're concerned with. With C# in particular it is not going to be obvious what the JIT is doing with these scenarios.

As a side note, I throw in a +1 for the other answers that point out that efficiency isn't only determined at the compiler level - code maintainability involves magnitudes of levels of efficiency more than what you'll get from this specific sort of pattern choice.

answered Dec 29 '16 at 15:50

Aaron Thomas

5,054
8
43
89

I am pretty sure benchmarks for this particular case will show nothing - it is exactly same logic with same amount of steps – Fabio Dec 29 '16 at 15:55
See @hvd's comment on question above. Surprising. – Aaron Thomas Dec 29 '16 at 15:57
1

Even the code will be compiled differently you will not notice any pattern in the benchmark results - in this particular case – Fabio Dec 29 '16 at 15:59
Benchmarking unoptimized code would be a complete waste of time, and would inevitably give you meaningless data. And it is quite obvious what the JIT is doing, you just look at the JITed code! In fact, that would really be the only good way to reason about this, given how difficult it would be to create a good test case that wouldn't be trivially optimized out, yet wouldn't be excessively noisy. – Cody Gray - on strike Dec 29 '16 at 16:00
@CodyGray can you elaborate on what you mean by "unoptimized code"? If that means C# (not JITed), then are you suggesting that the code be manipulated after JITed somehow? – Aaron Thomas Dec 29 '16 at 16:05
The quotation you pulled from the question says "if compiled verbatim with no compiler optimizations". That's what I'm referring to. Analyzing unoptimized code is meaningless. In C#, there are effectively two levels of optimization. The first is performed by the C# compiler, when it is generating IL. The second is performed by the JIT compiler when it is translating that IL into machine code. If you disable optimization at *either* stage and benchmark it in order to draw conclusions about performance, you're drawing conclusions based on incorrect and meaningless facts. – Cody Gray - on strike Dec 29 '16 at 16:07
@CodyGray okay that makes sense. Thanks for helping me understand. – Aaron Thomas Dec 29 '16 at 16:10

Which code-flow pattern is more efficient in C#/.NET?

4 Answers4