6

I was reading this question and it's accepted answer. I read the comments but I couldn't figure out the reason for the optimization produced.

Why does branching occur in assembly code when using the following code?

x >= start && x <= end

EDIT:
For clarity, I want to understand the reason of optimization produced by the accepted answer. That as I understand is the branching present in the assembly code produced by the compiler. I want to understand why is there a branch in the produced code.

Community
  • 1
  • 1
Aseem Bansal
  • 6,722
  • 13
  • 46
  • 84
  • 6
    Short Circuiting – Mysticial Jun 14 '13 at 17:47
  • @Mysticial I have no idea what that means in this context. I read that in the comments but couldn't understand that. That's why I started this question. – Aseem Bansal Jun 14 '13 at 17:49
  • @Mysticial: But short-circuiting makes no difference here, as there are no observable side-effects. The compiler would be entitled to implement this with a single branch. – Oliver Charlesworth Jun 14 '13 at 17:50
  • @OliCharlesworth Which is why I didn't make that an answer since I'm not 100% sure either. – Mysticial Jun 14 '13 at 17:51
  • @Jason I am asking about the reason for the optimization produced in the said [question's accepted answer](http://stackoverflow.com/questions/17095324/fastest-way-in-c-to-determine-if-an-integer-is-between-two-integers-inclusive). What is the reason for branching produced? – Aseem Bansal Jun 14 '13 at 17:55

1 Answers1

10

Note that the linked question has a fundamentally different expression

x >= start && x++ <= end

It is fundamentally different because the second sub-expression here has a side effect. I'll explain.

Note that && is a short-circuit operator. This means that if x >= start evaluates to false, the machine can branch over evaluation of x <= end.

More precisely, when the compiler emits instructions for x >= start && x <= end, it can emit instructions to branch over x <= end when x >= start evaluates to false.

However, I stress the use of the word can in the above statements. The reason for this is because x <= end has no side-effects and therefore it doesn't matter if the compiler branches over evaluation of it or not.

But, in the case that the second expression does have side effects the compiler must branch over it. Since && is a short-circuit operator, in a && b, if b has any side effects they must not be observed if a evaluates to false; this is a requirement of short-circuiting in C and most (if not all C-like languages).

So, in particular, when you look at

define POINT_IN_RANGE_AND_INCREMENT(p, range) 
    (p <= range.end && p++ >= range.start)

note that the second expression p++ >= range.start has a side effect. Namely, (post-)incrementing p by 1. But that side effect can only be observed if p <= range.end evaluates to true. Thus, the compiler must branch over evaluation of p++ >= range.start if p <= range.end evaluates to false.

The reason this results in a branch is because for machine to evaluate that expression, it uses the fact that if p <= range.end evaluates to false, then it automatically knows the entire expression evaluates to false and therefore it should not evaluate p++ >= range.start because it has a side-effect. Thus, it must branch over evaluating the second part of the expression. So in the assembly:

Ltmp1301:
 ldr    r1, [sp, #172] @ 4-byte Reload
 ldr    r1, [r1]
 cmp    r0, r1
 bls    LBB44_32
 mov    r6, r0         @ if the result of the compare is false 
 b      LBB44_33       @ branch over evaluating the second expression
                       @ to avoid the side effects of p++
LBB44_32:
 ldr    r1, [sp, #188] @ 4-byte Reload
 adds   r6, r0, #1
Ltmp1302:
 ldr    r1, [r1]
 cmp    r0, r1
 bhs    LBB44_36

Deep indebtedness to Oli Charlesworth for insightful comments.

Community
  • 1
  • 1
jason
  • 236,483
  • 35
  • 423
  • 525
  • Short-circuiting has better performance, especially when the second condition is hard to evaluate – Enigma Jun 14 '13 at 17:51
  • 1
    @Enigma: If the 2nd condition is trivial to evaluate, then short-circuiting probably results in significantly worse performance. Which is why I'm surprised the compiler didn't reduce this to a single branch (in the linked question). – Oliver Charlesworth Jun 14 '13 at 17:52
  • 1
    @Oli Charlesworth: Don't think so Oli. Look at the original post. The context is the definition of the macro(?) `POINT_IN_RANGE_AND_INCREMENT(p, range)` and why the compiler emits a branch when that macro is defined as `p <= range.end && p++ >= range.start`. – jason Jun 14 '13 at 17:57
  • @Jason: Yes, I've just spotted that the OP of the linked question only revealed the side-effect much later! (I was following the answers to that question yesterday.) Nevertheless, the answer to **this** question (as it currently stands) is not "short-circuiting". But perhaps the question needs to be edited... – Oliver Charlesworth Jun 14 '13 at 17:58
  • @OliCharlesworth What kind of editing? Any confusion regarding the question? I edited it once. – Aseem Bansal Jun 14 '13 at 18:01
  • @Oli Charlesworth: Hmm. I'm not seeing it. I think he's asking why the non-optimized version has the branch, so he can understand why the optimized version is so much better. The non-optimized version *does* have a branch because of short-circuiting. Do you disagree? – jason Jun 14 '13 at 18:02
  • 1
    @Zel: Because it turns out that in the linked question, the code isn't `(x > start && x < end)`, it's `(x > start && x++ < end)`, which makes a **huge** difference. – Oliver Charlesworth Jun 14 '13 at 18:03
  • @Jason That is exactly what I am asking. – Aseem Bansal Jun 14 '13 at 18:04
  • 2
    @Jason: In the OP's code, there is no side-effect, so short-circuiting is irrelevant. In the code in the linked question, there is (as you rightly point out) a side-effect, which *does* make a difference. – Oliver Charlesworth Jun 14 '13 at 18:04
  • @Oli Charlesworth: Doesn't matter. Short-circuiting still applies. – jason Jun 14 '13 at 18:05
  • 1
    @Jason: Not really. If there is no observable side-effect, the compiler is free to not jump out early (especially as that would be more optimal). – Oliver Charlesworth Jun 14 '13 at 18:06
  • @OliCharlesworth I am missing something. I didn't understand your confusion in the present question(not the linked question) – Aseem Bansal Jun 14 '13 at 18:06
  • @Zel: `(x > start && x < end)` is very different behaviourally to `(x > start && x++ < end)`. The "extra" branching applies to the second version, not what you have posted. – Oliver Charlesworth Jun 14 '13 at 18:07
  • 2
    @Zel The two questions are completely different beasts with regard to what kind of optimizations are possible. The linked question has a side effect in the later part which means the compiler is forced to do the short-circuiting for correctness. If there was no side effect not doing the short-circuiting would result in much better performance. – Voo Jun 14 '13 at 18:08
  • 2
    There's a difference between "may" and "must" when side effects are present or absent. If there are no side effects, the compiler may rearrange as it pleases (for speed), but if there are side effects—and there are—the compiler *must not* rearrange. It might be helpful to include that in the answer. – torek Jun 14 '13 at 18:09
  • @OliCharlesworth: I understand the point you are making. I will edit my answer accordingly. Give me a few minutes, and let me know if it makes sense? Thanks. – jason Jun 14 '13 at 18:15
  • @Oli Charlesworth: Good? Thanks for the comments. – jason Jun 14 '13 at 18:21