19

I am writing some code for data analysis, and have to exclude samples based on some criteria. In practice I end up writing code such as:

bool Test(SampleType sample)
{
  if( ! SubTest1(sample) )
    return false;
  if( ! SubTest2(sample) )
    return false;
  if( ! SubTest3(sample) )
    return false;

  return true;
}

The following seems equivalent to me:

bool Test(SampleType sample)
{
  if( ! SubTest1(sample) )
    return false;
  else if( ! SubTest2(sample) )
    return false;
  else if( ! SubTest3(sample) )
    return false;
  else 
    return true;
}

Is there a difference in terms of computing cost? Is there a arguable preferential one in terms of extendibility/maintainability, aesthetics, etc...?

I know this is probably an inconsequential issue, but once I get these questions stuck in my head I NEED to find the answer.

PS: in case anyone cares, my actual code as of 15/09 can be found at the following: http://folk.uio.no/henrikq/conf.tgz

qonf
  • 1,051
  • 1
  • 10
  • 21
  • 21
    If you are concerned about performance, then measure, with real data, the concrete compiler and library versions, and on the targeted platform. Until you've done this, prefer the version that's easier to read and maintain. – sbi Sep 14 '11 at 10:44
  • 3
    @sbi I am more concerned with getting a good thought out answer to a question that's stuck in my so that i can get back to working. Thanks for the suggestion though. – qonf Sep 14 '11 at 10:47
  • 5
    @qonf there is an easy answer: the best approach is to stop OCDing about "inconsequential issues" and get back to working. – R. Martinho Fernandes Sep 14 '11 at 10:50
  • 2
    @qonf: There is no good answer. This is highly depending on the compiler's code generator and what the target platform does with the generated code. Just learn to get over such questions and write _clean_ code first, and profile it later. – sbi Sep 14 '11 at 10:50
  • 6
    Is there no place here where you can ask a question on how things work without others telling you to do it some other way? – Shahbaz Sep 14 '11 at 11:11
  • 1
    @R. Martinho Fernandes If it was that easy... – qonf Sep 14 '11 at 11:33

10 Answers10

32

Compiler generates the same code for both the versions. But the 1st version is better in maintainability aspect if you compare just with the 2nd version.

The code exits when the return statement is encountered; so there is no use of keeping else in the upcoming if. It makes the developer understand the code better.

Also, if this is the literal code then you can still shrink as,

bool Test(SampleType sample)
{
  return (SubTest1(sample) && SubTest2(sample) && SubTest3(sample));
}
iammilind
  • 68,093
  • 33
  • 169
  • 336
  • 2
    It is not the literal code, but thanks. – qonf Sep 14 '11 at 10:50
  • 6
    Can't you argue that the second one is more maintainable because you won't forget to put the else if it stops being an early return? – R. Martinho Fernandes Sep 14 '11 at 10:52
  • 2
    @R. Martinho, there can be numerous possibilities like that. I have answered the exact OP's code. If the `return` is not there then the entire logic of the code will change. In fact, personally I avoid to keep multiple `return` values in the function (as it tends to place destructor code for automatic object at every `return` value). However, OP's requirements are different – iammilind Sep 14 '11 at 10:57
  • @iammilind I considered declaring a variable at start of the function, and then modify it depending on the results of the SubTest, but I came to the conclusion that i could not see how it would add anything. Unfortunatly, I try to do to much stuff in my "Test" function to make it a one-liner, It could probably be done with a number of additional function definitions, but In my oppinion c/c++ does not lend itself to functional programming (not that i am an expert on functional programming). – qonf Sep 14 '11 at 11:38
  • @qonf, if you still see the scope of improvement then you may want to start a new thread with the type of code you want. You may get some really good suggestions/answers. – iammilind Sep 14 '11 at 12:28
  • @iammilind: Are questions such as: "Here is my code, how can it be better?", well received? – qonf Sep 14 '11 at 12:50
  • 1
    @qonf, if you want to have your code reviewed then there is sister site [codereview.se](http://codereview.stackexchange.com/). If you can produce minimal amount of your code then this site will be a proper platform. Many people will well receive it many will criticize it; but I guess you will at least get a reasonable answer. – iammilind Sep 14 '11 at 12:56
29

I would do this:

return SubTest1(sample) && SubTest2(sample) && SubTest3(sample);

This won't improve performance, but what the function do may (or may not) be more obvious.

Arnaud Le Blanc
  • 98,321
  • 23
  • 206
  • 194
  • 2
    Whoa, this is what I call optimization :) – plaes Sep 14 '11 at 10:37
  • 15
    This is cool, but very hard to step over in debugger. – sharptooth Sep 14 '11 at 10:39
  • 1
    @plaes You think there's a performance difference? – selalerer Sep 14 '11 at 10:40
  • Do you think this would be any more efficient, computing wise using a compiler such as gcc? In my case It would end up as a 300 char length piece of code though, either that or i would need to define more functions. The example I gave was a simplified version of what i have in practice. – qonf Sep 14 '11 at 10:41
  • @sharptooth: maybe a *bit* harder. – Karoly Horvath Sep 14 '11 at 10:41
  • 10
    @selalerer: Nope, readability "optimization". – plaes Sep 14 '11 at 10:41
  • I think with optimizations this code and the two posted by the OP should compile to the same thing.. – Karoly Horvath Sep 14 '11 at 10:42
  • @yi_H And without optimization? I think it will be quite difficult for such a simple code to come out different in any of the options given here. – selalerer Sep 14 '11 at 10:50
  • None of you have considered the complexity of `subtest[n](sample)' Clearly what is needed is profiling of the subtest functions to ensure they are ordered such that the subtests which are most often false occur first. The speed of completion of the subtests should also be taken into consideration when designing the optimum order... [[Enjoy spending more time being OCD!! :-)]] – James Greenhalgh Sep 14 '11 at 11:18
  • 1
    @plaes: +1 to your comment for "readability optimization" – R.. GitHub STOP HELPING ICE Sep 14 '11 at 12:27
  • @James Greenhalgh You are of course assuming that the operations do not depend on each other. For instance, if SubTest2(sample) depends on SubTest1(sample) returning true, your "optimization" would merely introduce a bug, – luiscubal Sep 14 '11 at 13:30
  • @luiscubal Very true. In fact, we would really like to decide whether subtest1 returned at all, or whether it could potentially loop forever (in which case we would not need to generate code for subtest[2..n]). This sounds a difficult problem to solve, so we should probably halt our optimisation efforts there... – James Greenhalgh Sep 14 '11 at 13:37
  • @James Greenhalgh Even if subtest1 didn't return in case of error, the code for subtest2, etc. would need to be there in case subtest1 passed. Also, I'd say the "infinite loop" solution is very rarely the correct one. I'd return an error code, throw an exception or abort the program. – luiscubal Sep 14 '11 at 13:45
13

Both are completely equivalent in terms of behavior and compilers will likely emit identical machine code for both.

They don't even differ that much in terms of readability - both have the same amount of nesting. Compare this to case where no early return leads to deep nesting.

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979
5

In terms of readability,

bool Test( SampleType sample )
{
    return SubTest1( sample )
        && SubTest2( sample )
        && SubTest3( sample );
}

is far clearer than either of your options. Otherwise, you definitely want the else to make it clear that once one of the conditions has been met, none of the others will be tested.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • Thanks, but my actual case is slightly more convoluted; its not compressible to an readable single statement. – qonf Sep 14 '11 at 12:44
  • 1
    @qonf So present your actual case, so we can see it. (What I've found most effective for rendering complex `bool` functions readable is breaking the expression down into subexpressions, handled by separate functions.) – James Kanze Sep 14 '11 at 13:10
  • 2
    See question for my actual case. It was not my initial intent to show my acctuall code, i just wanted to figure out if there where any actual difference for the statement in the OP. I think my solution was pretty close to what you suggested. – qonf Sep 15 '11 at 08:54
2

Both are identical. The compiler is likely to figure out how to optimize them so that identical machine code is emitted. I would prump for the 1st code as it is easier to read.

Ed Heal
  • 59,252
  • 17
  • 87
  • 127
2

I can't imagine it will make any difference in performance.

I would use the first one if they were independent tests, and you just want to drop out after one them suceeds and the second one if it's choosing between a set of logical alternatives.

jcoder
  • 29,554
  • 19
  • 87
  • 130
0

As far as I know, there's no difference in the resulted machine code. Besides that, who cares? Do you have a performance problem with this code? Performance is a difficult subject that usually should be left to the end of the project and only if there are performance problems they should be found and fixed WHERE THEY ACTUALLY EXIST and not where you think in advance they might be.

selalerer
  • 3,766
  • 2
  • 23
  • 33
  • True, if only my mind would let these things go... – qonf Sep 14 '11 at 10:42
  • I don't agree with anything you said. – Widor Sep 14 '11 at 10:42
  • If two pieces of code are identical in terms of complexity, there's no reason to not choose the one that's faster. – Andreas Brinck Sep 14 '11 at 10:46
  • This is the only true answer, and it's downvoted. I fear for SO's quality. – sbi Sep 14 '11 at 10:48
  • @Andreas: If two pieces of code are identical in complexity (big-O), then there's no reason to not to choose the one that is easier to read an maintain. – sbi Sep 14 '11 at 10:49
  • 2
    _Performance is a difficult subject that usually should be left to the end of the project_ why would you leave it for later when you can code it good from the beginning? I agree that OP is going too far with this example, but I don't agree with the fact that while coding you should ignore performance. If you learn which methods are better (whether your "better" is "more optimized" or "more readable"), you should apply from the very beginning rather than coming back for fixes. – Shahbaz Sep 14 '11 at 11:10
  • @sbi Arguably, its not a answer as much as a: "Instead of answering your technical question, I am going to tell you what to focus on when doing what i presume is your overarching goal." – qonf Sep 14 '11 at 11:42
  • @Shahbaz: You shouldn't completely ignore performance during the design and implementation of an application. But during implementation you should only worry about picking algorithms with less complexity (big-O), not about micro-optimizations like what the OP is asking about. And I wouldn't even put too much worry into that, because algorithms could be replaced later. It just makes no sense to put too much effort into this before you know whether the performance of a specific piece of code really matters. Because it matters only in 20% of your code, and which 20% that are only profiling shows. – sbi Sep 14 '11 at 12:21
  • 2
    @sbi It can be extremely helpful to point out to someone that they are going about something wrong. It can also be extremely frustrating for the asker if you are mistaken in your assumption on what the ultimate goal of the asker is! Therefore, some humility is prudent when answering in such a manner. Its arrogant to presume you know what a person is trying to accomplish when you only have that persons technical question. – qonf Sep 14 '11 at 12:33
  • @qonf: That is why I posted mine as a comment, rather than as an answer. That doesn't mean, though, that this answer is worth downvoting. – sbi Sep 14 '11 at 12:38
  • @sbi I agree, which is why i have not done so. – qonf Sep 14 '11 at 12:40
0

It mostly depends on how the compiler optimize the resulting code.

the if(.) is typically translated with a compare_to_zero instruction, followed by a conditional jump.

In your case, this will be

CMP(c1)
RETZ
CMP(c2)
RETZ
CMP(c3)
RETZ

or (with the else-s)

   CMP(c1)
   JNZ(a1)
   RET
a1:CMP(c2)
   JNZ(a2)
   RET
a2:CMP(c3)
   JNZ(a3)
   RET
a3:RET

A second pass of the optimizer will see the chain of jumps, invert the then/else (and Z/NZ) skip off the jumps, being at the point a "jump to next", giving exactly the previous code.

Emilio Garavaglia
  • 20,229
  • 2
  • 46
  • 63
  • 1
    *my lowly "high level programming" brain explodes!!! – qonf Sep 14 '11 at 12:45
  • This is not actually true for any modern compiler with optimizations turned on. – Omnifarious Jan 09 '12 at 06:50
  • @Omnifarious Can you be more accurate? WHAT is not actually true? I also spoke about optimization. What are you adding on? – Emilio Garavaglia Jan 09 '12 at 07:32
  • @EmilioGaravaglia: The assembly doesn't show up the way you state. A set of nested `if ... else` statements will like result in nearly identical code to a series of `if` statements with multiple returns. I doubt you could find any measurable performance differences at all. – Omnifarious Jan 09 '12 at 08:25
  • @Omnifarious: I don't understand the meaning of your comment: It seems to me you are just saying what I just said, with different wording... Am I awake or my post ends with "giving exactly the previous code" ?!?! – Emilio Garavaglia Jan 09 '12 at 21:50
  • @EmilioGaravaglia - Oh, yes, you're right. _sigh_ Mea culpa. – Omnifarious Jan 10 '12 at 12:14
0

It depends on the language and the surrounding APIs

The first solution is an typical imperative approach, the focus is on the act of checking something: abort if a, abort if b, abort if c, success.

The second solution is a more functional aproach, one could read it as the curly brace in mathematical definitions (as in: fac(n) = { n <= 1: 1, n > 1: n * fac(n-1)).

The solution should be chosen to match the environment. In Java, Python, etc it would be the first. In ruby the second could be fine.

keppla
  • 1,753
  • 2
  • 15
  • 29
-4

there is a good practice that is called "one point of entry, one point of exit". It means that it's better to have only one return in a function.

For now, your function is simple, but in the future, it may grow, become more complicated, allocate temporary variables which have to be freed, etc.

So it's better to use good practices everywhere. It's called "defensive programming", defense against human failures (mine and my colleagues').

I would write it this way :

bool Test(SampleType sample)
{
  bool bRet = true; // by default, set it to the most "defensive" value which will cause the less harm or make the problem evident

  // in the future, you may have inits and allocations here

  if ( !SubTest1(sample) )
    bRet = false; // don't match
  else if ( !SubTest2(sample) )
    bRet = false; // don't match
  else if ( !SubTest3(sample) )
    bRet = false; // don't match
  else
    ; // bRet stays true

  // thanks to the single point of exit, you can do things like that
  if (bRet)
     log("... match");
  else
     log("...no match")

  // here you clean temporary resources...

  return bRet;
}

If you want to improve performance, the best way is to put the SubTestX functions in the best order so that the one that doesn't match most often is first, so less tests are needed to find that it doesn't match.

Offirmo
  • 18,962
  • 12
  • 76
  • 97
  • +1 - This is the answer I just wanted to write! – Stephan Sep 14 '11 at 11:45
  • 4
    oh dear, "good practice" and hungarian notation in C# in the same post? – AakashM Sep 14 '11 at 12:01
  • 5
    Introducing a "state variable" to manage a "false state" is far being a good practice. Especially on certain platforms. Multiple return has their good reason to exist as goto has its own good reason to exist, as if/then/else as for(;;) and while(). (these last to with and without break.) "Good practice" are contextual. And there is not enough "context" here to suggest them. – Emilio Garavaglia Sep 14 '11 at 12:11
  • 6
    Single-return-point is false economy. I see no reason to bend control structures around to accommodate it, as you have done here. – jprete Sep 14 '11 at 12:25
  • 4
    This is nonsense. "Single entry, single exit" is a dinosaur from the times we were coding in languages with no exceptions and no means of resource management. Unless you're hacking in C or assembler, there's no reason to complicate a simple algorithm to the point where I have to mentally execute it, in order to have an idea which values control variables might have, just to understand what it is doing. Control structures will always be easier to follow than control variables. – sbi Sep 14 '11 at 12:37
  • Well, if you were in *my* company *right now*, you would like this rule to have been followed by all the previous developpers ! It's very easy to miss one little "return" in a complex control flow. And how do you handle temp variables deallocation if there are plenty of returns ? You duplicate the cleaning code before each return ? To me, this rule has proved robust and useful in all my 7+ years of experience. – Offirmo Sep 14 '11 at 17:10
  • @emilio where do you see a state variable ? It's just the plain return value. – Offirmo Sep 14 '11 at 17:15
  • 1
    @offfirmo bRet is a "state variable". It stores a value that is changed as conditions are evaluated. – Emilio Garavaglia Sep 14 '11 at 19:56
  • @AakashM: Who said C#? The question is tagged `c++` and `c`, and the code looks like valid C to me. (That said, I agree that Hungarian notation is not a good practice, in any language.) – Daniel Pryden Sep 14 '11 at 21:51
  • 1
    OK, a colleague of mine has been nicer than SO users and actually explained why the above code is not correct in modern C++ : cleaning resource at the end of a function makes it non thread-safe. Objects should "clean" themselves automatically using RAII/auto_ptr/unique_ptr (http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization). The RAII technique can be used to print a log when we step out of the function, regardless of the place. Regarding hungarian notation, it is generally not encouraged (http://en.wikipedia.org/wiki/Hungarian_notation) but still valid in some case. – Offirmo Sep 16 '11 at 16:00
  • having no more than one return is no more than prejudice.... – Bhupesh Pant Sep 30 '13 at 18:53