97

This is a situation I encounter frequently as an inexperienced programmer and am wondering about particularly for an ambitious, speed-intensive project of mine I'm trying to optimize. For the major C-like languages (C, objC, C++, Java, C#, etc) and their usual compilers, will these two functions run just as efficiently? Is there any difference in the compiled code?

void foo1(bool flag)
{
    if (flag)
    {
        //Do stuff
        return;
    }

    //Do different stuff
}

void foo2(bool flag)
{
    if (flag)
    {
        //Do stuff
    }
    else
    {
        //Do different stuff
    }
}

Basically, is there ever a direct efficiency bonus/penalty when breaking or returning early? How is the stackframe involved? Are there optimized special cases? Are there any factors (like inlining or the size of "Do stuff") that could affect this significantly?

I'm always a proponent of improved legibility over minor optimizations (I see foo1 a lot with parameter validation), but this comes up so frequently that I'd like to set aside all worry once and for all.

And I'm aware of the pitfalls of premature optimization... ugh, those are some painful memories.

EDIT: I accepted an answer, but EJP's answer explains pretty succinctly why the use of a return is practically negligible (in assembly, the return creates a 'branch' to the end of the function, which is extremely fast. The branch alters the PC register and may also affect the cache and pipeline, which is pretty minuscule.) For this case in particular, it literally makes no difference because both the if/else and the return create the same branch to the end of the function.

Philip Guin
  • 1,452
  • 15
  • 21
  • 22
    I don't think such kind of things will have a noticable impact on performance. Just write a small test and see yourself. Imo, the first variant is better since you don't get unneeded nesting which improves readablitiy – SirVaulterScoff Oct 25 '11 at 04:35
  • 10
    @SirVaulterScott, unless the two cases are symmetrical in some way, in which case you would want to bring out the symmetry by putting them at the same level of indentation. – luqui Oct 25 '11 at 07:23
  • 3
    SirVaulterScoff: +1 for reducing unneeded nesting – fjdumont Oct 25 '11 at 07:31
  • 11
    Readability >>> Micro optimizations. Do it whichever way makes more sense to the wetware who will be maintaining this. At a machine code level, these two structures are identical when fed into even a fairly dumb compiler. An optimizing compiler will erase any semblance of speed advantage between the two. – SplinterReality Oct 25 '11 at 07:51
  • 12
    Don't optimise your "speed-intensive" project by worrying about things like this. Profile your app to find out where it's actually slow - if it is actually too slow when you've finished making it work. You almost certainly can't guess what's actually slowing it down. – blueshift Oct 25 '11 at 08:01
  • 1
    Code first, optimize last. Always use a good profiler when you optimize. Never just change things because you *know* (read think) it will be faster. – Gilles Oct 25 '11 at 18:19
  • 3
    With modern compilers, you can assume that if there are trivial ways you could rewrite the function, the compiler knows which one it will work best. In this case in particular, though, most compilers will generate identical code. – Russell Borogove Oct 25 '11 at 18:53
  • Wow, lots of great feedback. My understanding thus far: compiler optimizations are hard to predict, and if you can easily imagine a way to optimize code, the compiler may already do this for you- therefore, you should generally ignore micro-optimizations. Still, I find it hard to believe that compiler optimizations leave absolutely no room for inventiveness. I mean, wouldn't some optimizations be too expensive to calculate in a reasonable amount of time? I guess I'm a compiler agnostic :) – Philip Guin Oct 25 '11 at 21:24
  • 2
    Human inventiveness is most effectively applied to algorithmic level optimizations. If you do a bunch of linear search on an array, for example, the compiler doesn't know to sort it and change it to a binary search, or to replace the array with a tree/dictionary/map class. That's the optimization level that most programmers should be concerning themselves with these days (and if you're working in a higher level language than C, using appropriate containers will improve your code's readability and maintainability anyway). – Russell Borogove Oct 26 '11 at 01:40
  • It'd be great to pick which form is more readable and be consistent throughout the project, though. – Filip Dupanović Oct 28 '11 at 10:12
  • C++ has a difference in this _concept_ if not particular example. When a function has a return type that isn't void, and two different return points that return different objects, that disables NRVO, which _might_ cause it to run slightly slower. – Mooing Duck Jan 12 '12 at 01:17

11 Answers11

92

There is no difference at all:

=====> cat test_return.cpp
extern void something();
extern void something2();

void test(bool b)
{
    if(b)
    {
        something();
    }
    else
        something2();
}
=====> cat test_return2.cpp
extern void something();
extern void something2();

void test(bool b)
{
    if(b)
    {
        something();
        return;
    }
    something2();
}
=====> rm -f test_return.s test_return2.s
=====> g++ -S test_return.cpp 
=====> g++ -S test_return2.cpp 
=====> diff test_return.s test_return2.s
=====> rm -f test_return.s test_return2.s
=====> clang++ -S test_return.cpp 
=====> clang++ -S test_return2.cpp 
=====> diff test_return.s test_return2.s
=====> 

Meaning no difference in generated code whatsoever even without optimization in two compilers

Daniel
  • 30,896
  • 18
  • 85
  • 139
  • 60
    Or better: there is at least a version of a certain compiler that generates the same code for the two versions. – UncleZeiv Oct 25 '11 at 08:47
  • 11
    @UncleZeiv - most if not all compilers will translate the source to an execution flow graph model. It's hard to imagine a sane implementation that would give *meaningfully* different flow graphs for those two examples. About the only difference you might see is that the two different do-somethings get swapped - and even that may well be undone in many implementations to optimise branch prediction or for some other issue where the platform determines the preferred ordering. –  Oct 25 '11 at 09:55
  • 6
    @Steve314, sure, I was just nitpicking :) – UncleZeiv Oct 25 '11 at 10:15
  • @UncleZeiv: tested on clang too and same result – Daniel Oct 25 '11 at 17:23
  • I don't understand. It seems clear that `something()` will be always executed. In the original question, OP has `Do stuff` and `Do diffferent stuff` depending on the flag. I'm not sur that the generated code will be the same. – Luc M Oct 25 '11 at 20:31
  • Ah, taking care of business I was too ignorant to do myself. I'll definitely be doing this from now on. @Luc M I just noticed that as well... update and I'll accept! Well, assuming its still a great answer :) – Philip Guin Oct 25 '11 at 21:29
  • @Luc M - In both cases, the target code will either call `do_stuff` or `do_different_stuff`. In both cases, which is called depends on the flag. In both cases, the flag is checked, then the appropriate one of those calls is done. Machine code doesn't have a block-structured `if` statement, with `else` or otherwise, so the compiler has to translate. The intermediate step is an execution flow graph. Try manually converting to old-school flow charts - http://en.wikipedia.org/wiki/Flowchart - to understand why they both end up the same. –  Oct 28 '11 at 04:00
65

The short answer is, no difference. Do yourself a favour and stop worrying about this. The optimising compiler is almost always smarter than you.

Concentrate on readability and maintainability.

If you want to see what happens, build these with optimisations on and look at the assembler output.

blueshift
  • 6,742
  • 2
  • 39
  • 63
  • 8
    @Philip: And do everybody else a favour as well and stop worrying about this. The code you write will be read and maintained by others as well (and even should you write that never will be read by others you still develop habits that will influence other code you write that will be read by others). *Always* write code to be as easy to understand as possible. – hlovdal Oct 25 '11 at 06:31
  • 8
    Optimizers are not smarter than you!!! They are only faster at deciding where the impact does not matter to much. Where it really matters you will most certainly with some experience optimize better than the compiler. – johannes Oct 25 '11 at 08:16
  • 10
    @johannes Let me disagree. The compiler won't change your algorithm for a better one, but it does an amazing job at reordering instructions to achieve maximum pipeline efficiency and other not so trivial things for loops (fission, fusion, etc.) that even an experienced programmer cannot decide what is better a priori unless he has an intimate knowledge of the CPU architecture. – fortran Oct 25 '11 at 09:26
  • 3
    @johannes - for this question, you can assume it does. Also, in general, you may *occasionally* be able to optimise better than the compiler in a few special cases but that takes a fair bit of specialist knowledge these days - the normal case is that the optimiser applies most optimisations you can think of and does so systematically, not just in a few special cases. WRT this question, the compiler will probably construct *precisely* the same execution flow-graph for *both* forms. Choosing a better algorithm is a human job, but code-level optimisation is almost always a waste of time. –  Oct 25 '11 at 09:46
  • 4
    I agree and disagree with this. There are cases when the compiler can't know that something is equivalent to something else. Did you know it's often way faster to do `x = ` than `if() x = ` Uneeded branches can really hurt. On the other hand, unless this is inside the main loop of an extremely intensive operation, I wouldn't worry about it either. – user606723 Oct 25 '11 at 14:00
  • 1
    Concentrate first and foremost on good algorithms and QA, then on readability and maintainability. If you still have time (I highly doubt it) you can focus on micro-optimizations. – CAFxX Dec 05 '11 at 12:51
28

Interesting answers: Although I do agree with all of them (so far), there are possible connotations to this question that are up to now completely disregarded.

If the simple example above is extended with resource allocation, and then error checking with a potential resulting freeing of resources, the picture might change.

Consider the naive approach beginners might take:

int func(..some parameters...) {
  res_a a = allocate_resource_a();
  if (!a) {
    return 1;
  }
  res_b b = allocate_resource_b();
  if (!b) {
    free_resource_a(a);
    return 2;
  }
  res_c c = allocate_resource_c();
  if (!c) {
    free_resource_b(b);
    free_resource_a(a);
    return 3;
  }

  do_work();

  free_resource_c(c);
  free_resource_b(b);
  free_resource_a(a);

  return 0;
}

The above would represent an extreme version of the style of returning prematurely. Notice how the code becomes very repetitive and non-maintainable over time when its complexity grows. Nowadays people might use exception handling to catch these.

int func(..some parameters...) {
  res_a a;
  res_b b;
  res_c c;

  try {
    a = allocate_resource_a(); # throws ExceptionResA
    b = allocate_resource_b(); # throws ExceptionResB
    c = allocate_resource_c(); # throws ExceptionResC
    do_work();
  }  
  catch (ExceptionBase e) {
    # Could use type of e here to distinguish and
    # use different catch phrases here
    # class ExceptionBase must be base class of ExceptionResA/B/C
    if (c) free_resource_c(c);
    if (b) free_resource_b(b);
    if (a) free_resource_a(a);
    throw e
  }
  return 0;
}

Philip suggested, after looking at the goto example below, to use a break-less switch/case inside the catch block above. One could switch(typeof(e)) and then fall through the free_resourcex() calls but this is not trivial and needs design consideration. And remember that a switch/case without breaks is exactly like the goto with daisy-chained labels below...

As Mark B pointed out, in C++ it is considered good style to follow the Resource Aquisition is Initialization principle, RAII in short. The gist of the concept is to use object instantiation to aquire resources. The resources are then automatically freed as soon as the objects go out of scope and their destructors are called. For interdepending resources special care has to be taken to ensure the correct order of deallocation and to design the types of objects such that required data is available for all destructors.

Or in pre-exception days might do:

int func(..some parameters...) {
  res_a a = allocate_resource_a();
  res_b b = allocate_resource_b();
  res_c c = allocate_resource_c();
  if (a && b && c) {   
    do_work();
  }  
  if (c) free_resource_c(c);
  if (b) free_resource_b(b);
  if (a) free_resource_a(a);

  return 0;
}

But this over-simplified example has several drawbacks: It can be used only if the allocated resources do not depend on each other (e.g. it could not be used for allocating memory, then opening a filehandle, then reading data from the handle into the memory), and it does not provide individial, distinguishable error codes as return values.

To keep code fast(!), compact, and easily readable and extensible Linus Torvalds enforced a different style for kernel code that deals with resources, even using the infamous goto in a way that makes absolutely sense:

int func(..some parameters...) {
  res_a a;
  res_b b;
  res_c c;

  a = allocate_resource_a() || goto error_a;
  b = allocate_resource_b() || goto error_b;
  c = allocate_resource_c() || goto error_c;

  do_work();

error_c:
  free_resource_c(c);
error_b:
  free_resource_b(b);
error_a:
  free_resource_a(a);

  return 0;
}

The gist of the discussion on the kernel mailing lists is that most language features that are "preferred" over the goto statement are implicit gotos, such as huge, tree-like if/else, exception handlers, loop/break/continue statements, etc. And goto's in the above example are considered ok, since they are jumping only a small distance, have clear labels, and free the code of other clutter for keeping track of the error conditions. This question has also been discussed here on stackoverflow.

However what's missing in the last example is a nice way to return an error code. I was thinking of adding a result_code++ after each free_resource_x() call, and returning that code, but this offsets some of the speed gains of the above coding style. And it's hard to return 0 in case of success. Maybe I'm just unimaginative ;-)

So, yes, I do think there is a big difference in the question of coding premature returns or not. But I also think it is apparent only in more complicated code that is harder or impossible to restructure and optimize for the compiler. Which is usually the case once resource allocation comes into play.

Community
  • 1
  • 1
cfi
  • 10,915
  • 8
  • 57
  • 103
  • 1
    Wow, really interesting. I can definitely appreciate the unmaintainability of the naive approach. How would the exception handling improve on that particular case though? Like a `catch` containing a break-less `switch` statement on the error code? – Philip Guin Oct 25 '11 at 20:39
  • @Philip Added basic exception handling example. Note that only the goto has a fall-through possibility. Your proposed switch(typeof(e)) would help, but is [not trivial and needs design consideration](http://stackoverflow.com/q/351845/923794). And remember that a switch/case without breaks is exactly like the goto with daisy-chained labels ;-) – cfi Oct 26 '11 at 08:26
  • +1 this is the correct answer for C/C++ (or any language which requires manual freeing of memory). Personally, I don't like the multiple-label version. At my previous company, it was always "goto fin" (it was a French company). In fin we would de-allocate any memory, and that was the only use of goto that would pass code review. – Kip Oct 28 '11 at 18:09
  • 1
    Note that in C++ you wouldn't do any of these approaches, but would use RAII to make sure that the resources are cleaned up properly. – Mark B Nov 18 '11 at 05:27
12

Even though this isn't much an answer, a production compiler is going to be much better at optimizing than you are. I would favor readability and maintainability over these kinds of optimizations.

Lou
  • 1,955
  • 14
  • 16
9

To be specific about this, the return will be compiled into a branch to the end of the method, where there will be a RET instruction or whatever it may be. If you leave it out, the end of the block before the else will be compiled into a branch to the end of the else block. So you can see in this specific case it makes no difference whatsoever.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Gotcha. I actually think this answers my question pretty succinctly; I guess it's literally just a register addition, which is pretty negligible (unless maybe you're doing systems programming, and even then...) I'm gonna give this an honorable mention. – Philip Guin Oct 26 '11 at 00:57
  • @Philip what register addition? There is no extra instruction in the path at all. – user207421 Oct 26 '11 at 01:15
  • Well both would have register additions. That's all an assembly branch is, isn't it? An addition to the program counter? I could be wrong here. – Philip Guin Oct 26 '11 at 01:19
  • 1
    @Philip No, an assembly branch is an assembly branch. It does affect the PC of course but it could be by completely reloading it, and it also has side-effects in the processor w.r.t. the pipeline, caches, etc. – user207421 Oct 26 '11 at 01:30
4

If you really want to know if there's a difference in compiled code for your particular compiler and system, you'll have to compile and look at the assembly yourself.

However in the big scheme of things it's almost certain that the compiler can optimize better than your fine tuning, and even if it can't it's very unlikely to actually matter for your program's performance.

Instead, write the code in the clearest way for humans to read and maintain, and let the compiler do what it does best: Generate the best assembly it can from your source.

Mark B
  • 95,107
  • 10
  • 109
  • 188
4

In your example, the return is noticeable. What happens to the person debugging when the return is a page or two above/below where //do different stuff occurs? Much harder to find/see when there is more code.

void foo1(bool flag)
{
    if (flag)
    {
        //Do stuff
        return;
    }

    //Do different stuff
}

void foo2(bool flag)
{
    if (flag)
    {
        //Do stuff
    }
    else
    {
        //Do different stuff
    }
}
PCPGMR
  • 340
  • 2
  • 7
  • Of course, a function should not be more than one (or even two) pages long. But the debugging aspect hasn't been covered in any of the other answers yet. Point taken! – cfi Oct 26 '11 at 08:36
3

I agree strongly with blueshift: readability and maintainability first!. But if you're really worried (or just want to learn what your compiler is doing, which definitely a good idea in the long run), you should look for yourself.

This will mean using a decompiler or looking at low level compiler output (e.g. assembly lanuage). In C#, or any .Net language, the tools documented here will give you what you need.

But as you yourself have observed, this is probably premature optimization.

Community
  • 1
  • 1
Mr. Putty
  • 2,276
  • 1
  • 20
  • 20
1

From Clean Code: A Handbook of Agile Software Craftsmanship

Flag arguments are ugly. Passing a boolean into a function is a truly terrible practice. It immediately complicates the signature of the method, loudly proclaiming that this function does more than one thing. It does one thing if the flag is true and another if the flag is false!

foo(true);

in code will just make the reader to navigate to the function and waste time reading foo(boolean flag)

Better structured code base will give you better opportunity to optimize code.

Yuan
  • 2,690
  • 4
  • 26
  • 38
  • 1
    I'm just using this as an example. What's getting passed into the function could be an int, double, a class, you name it, its not really at the heart of the problem. – Philip Guin Oct 26 '11 at 00:43
  • The question you asked is about doing a switch inside of your function, most of the case, it is a code smell. It can be achieved many ways and the reader does not have to read that whole function, say what does foo(28) mean? – Yuan Oct 26 '11 at 00:55
0

One school of thought (can't remember the egghead who proposed it at the moment) is that all function should only have one return point from a structural point of view to make the code easier to read and debug. That, I suppose, is more for programming religious debate.

One technical reason you may want to control when and how a function exits that breaks this rule is when you are coding real-time applications and you want to make sure that all control paths through the function take the same number of clock cycles to complete.

MartyTPS
  • 530
  • 2
  • 5
-5

I'm glad you brought this question up. You should always use the branches over an early return. Why stop there? Merge all your functions into one if you can (at least as much as you can). This is doable if there is no recursion. In the end, you will have one massive main function, but that is what you need/want for this sort of thing. Afterward, rename your identifiers to be as short as possible. That way when your code is executed, less time is spent reading names. Next do ...

Thomas Eding
  • 35,312
  • 13
  • 75
  • 106