14

So at the suggestion of a colleague, I just tested the speed difference between the ternary operator and the equivalent If-Else block... and it seems that the ternary operator yields code that is between 1x and 2x faster than If-Else. My code is:

  gettimeofday(&tv3, 0);
  for(i = 0; i < N; i++)
  {
     a = i & 1;
     if(a) a = b; else a = c;
  }
  gettimeofday(&tv4, 0);


  gettimeofday(&tv1, 0);
  for(i = 0; i < N; i++)
  {
     a = i & 1;
     a = a ? b : c;
  }
  gettimeofday(&tv2, 0);

(Sorry for using gettimeofday and not clock_gettime... I will endeavor to better myself.)

I tried changing the order in which I timed the blocks, but the results seem to persist. What gives? Also, the If-Else shows much more variability in terms of execution speed. Should I be examining the assembly that gcc generates?

By the way, this is all at optimization level zero (-O0).

Am I imagining this, or is there something I'm not taking into account, or is this a machine-dependent thing, or what? Any help is appreciated.

Evan Teran
  • 87,561
  • 32
  • 179
  • 238
Patrick87
  • 27,682
  • 3
  • 38
  • 73
  • Even better, try `a = i%2 ? b : c`, and then compare with optimization `-O2` or `-O3`. – Kerrek SB Jul 19 '11 at 21:38
  • A good compiler would notice that this is equivalent to `a = (N & 1) ? c : b`. But where can I find such a compiler? (yes, yes, as long as N > 0). – Omri Barel Jul 19 '11 at 21:39
  • 14
    Benchmarks with optimizations turned off are meaningless... – Evan Teran Jul 19 '11 at 21:39
  • 1
    *"By the way, this is all at optimization level zero (-O0)."* That means you've told the compiler not to optimize (basically), so it's no surprise that it's going to generate more verbose (and therefore slower) code for more verbose code. If you throw even an `-O1` at it, I suspect you won't notice any difference at all. Not seeing much point in looking at the distinction in performance in unoptimized code. – T.J. Crowder Jul 19 '11 at 21:42
  • 1
    Note: ? is the conditional operator, A ternary operator, not THE ternary operator. – Sani Huttunen Jul 19 '11 at 21:48
  • I understand that benchmarking with no optimization isn't particularly meaningful, but then again I'm not really asking about making this code run as fast as possible... I'll just take a look at the assembly and verify the answers that mention cmov. – Patrick87 Jul 19 '11 at 21:48
  • 2
    @Sani: it's the only ternary operator in C++, hence THE ternary operator. – Steve Jessop Jul 19 '11 at 22:30
  • @Steve Jessop: Doesn't matter if it's the only ternary operator. It is still called the conditional operator and it is a ternary operator. As I wrote: it was a note since I sometimes also use the terms "the ternary operator"/"the conditional operator" interchangeably, but it's still not correct. There could be another ternary operator introduced sometime in the future so it would be better to learn the correct term to avoid a possible mixup (as Jon Skeet would put it). – Sani Huttunen Jul 20 '11 at 00:20
  • 1
    @Sani: yeah, and the USA might in future move to a system of having multiple incumbent Presidents, but until then Obama (or other incumbent) is THE President, and Bush is A President. Funny old language ;-p – Steve Jessop Jul 20 '11 at 09:17
  • @Steve Jessop: Yes, anything can happen. The USA might even become a democracy... ;) – Sani Huttunen Jul 20 '11 at 12:50
  • Possible duplicate of [Ternary operator ?: vs if...else](http://stackoverflow.com/questions/3565368/ternary-operator-vs-if-else) – Ciro Santilli OurBigBook.com May 19 '16 at 09:52

6 Answers6

25

There's a good chance that the ternary operator gets compiled into a cmov while the if/else results in a cmp+jmp. Just take a look at the assembly (using -S) to be sure. With optimizations enabled, it won't matter any more anyway, as any good compiler should produce the same code in both cases.

Anteru
  • 19,042
  • 12
  • 77
  • 121
  • 14
    On today's CPUs, a conditional move has a fixed latency while a well-predicted conditional branch is basically free. For this reason, you might not see an optimizing compiler generate `CMOV` quite as often as you'd think. – Cory Nelson Jul 19 '11 at 22:10
14

You could also go completely branchless and measure if it makes any difference:

int m = -(i & 1);
a = (b & m) | (c & ~m);

On today's architectures, this style of programming has grown a bit out of fashion.

fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • 3
    Its still useful when you have a condition in a loop which runs a large number (say billions) of times. – sumodds Apr 18 '12 at 16:15
11

This is a nice explanation: http://www.nynaeve.net/?p=178

Basically, there are "conditional set" processor instructions, which is faster than branching and setting in separate instructions.

trutheality
  • 23,114
  • 6
  • 54
  • 68
4

If there is any, change your compiler!

For this kind of questions I use the Try Out LLVM page. It's an old release of LLVM (still using the gcc front-end), but those are old tricks.

Here is my little sample program (simplified version of yours):

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int main (int argc, char* argv[]) {
  int N = atoi(argv[0]);

  int a = 0, d = 0, b = atoi(argv[1]), c = atoi(argv[2]);

  int i;
  for(i = 0; i < N; i++)
  {
     a = i & 1;
     if(a) a = b+i; else a = c+i;
  }

  for(i = 0; i < N; i++)
  {
     d = i & 1;
     d = d ? b+i : c+i;
  }

  printf("%d %d", a, d);

  return 0;
}

And there is the corresponding LLVM IR generated:

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %0 = load i8** %argv, align 8                   ; <i8*> [#uses=1]
  %N = tail call i32 @atoi(i8* %0) nounwind readonly ; <i32> [#uses=5]

  %2 = getelementptr inbounds i8** %argv, i64 1   ; <i8**> [#uses=1]
  %3 = load i8** %2, align 8                      ; <i8*> [#uses=1]
  %b = tail call i32 @atoi(i8* %3) nounwind readonly ; <i32> [#uses=2]

  %5 = getelementptr inbounds i8** %argv, i64 2   ; <i8**> [#uses=1]
  %6 = load i8** %5, align 8                      ; <i8*> [#uses=1]
  %c = tail call i32 @atoi(i8* %6) nounwind readonly ; <i32> [#uses=2]

  %8 = icmp sgt i32 %N, 0                         ; <i1> [#uses=2]
  br i1 %8, label %bb, label %bb11

bb:                                               ; preds = %bb, %entry
  %9 = phi i32 [ %10, %bb ], [ 0, %entry ]        ; <i32> [#uses=2]
  %10 = add nsw i32 %9, 1                         ; <i32> [#uses=2]
  %exitcond22 = icmp eq i32 %10, %N               ; <i1> [#uses=1]
  br i1 %exitcond22, label %bb10.preheader, label %bb

bb10.preheader:                                   ; preds = %bb
  %11 = and i32 %9, 1                             ; <i32> [#uses=1]
  %12 = icmp eq i32 %11, 0                        ; <i1> [#uses=1]
  %.pn13 = select i1 %12, i32 %c, i32 %b          ; <i32> [#uses=1]
  %tmp21 = add i32 %N, -1                         ; <i32> [#uses=1]
  %a.1 = add i32 %.pn13, %tmp21                   ; <i32> [#uses=2]
  br i1 %8, label %bb6, label %bb11

bb6:                                              ; preds = %bb6, %bb10.preheader
  %13 = phi i32 [ %14, %bb6 ], [ 0, %bb10.preheader ] ; <i32> [#uses=2]
  %14 = add nsw i32 %13, 1                        ; <i32> [#uses=2]
  %exitcond = icmp eq i32 %14, %N                 ; <i1> [#uses=1]
  br i1 %exitcond, label %bb10.bb11_crit_edge, label %bb6

bb10.bb11_crit_edge:                              ; preds = %bb6
  %15 = and i32 %13, 1                            ; <i32> [#uses=1]
  %16 = icmp eq i32 %15, 0                        ; <i1> [#uses=1]
  %.pn = select i1 %16, i32 %c, i32 %b            ; <i32> [#uses=1]
  %tmp = add i32 %N, -1                           ; <i32> [#uses=1]
  %d.1 = add i32 %.pn, %tmp                       ; <i32> [#uses=1]
  br label %bb11

bb11:                                             ; preds = %bb10.bb11_crit_edge, %bb10.preheader, %entry
  %a.0 = phi i32 [ %a.1, %bb10.bb11_crit_edge ], [ %a.1, %bb10.preheader ], [ 0, %entry ] ; <i32> [#uses=1]
  %d.0 = phi i32 [ %d.1, %bb10.bb11_crit_edge ], [ 0, %bb10.preheader ], [ 0, %entry ] ; <i32> [#uses=1]
  %17 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %a.0, i32 %d.0) nounwind ; <i32> [#uses=0]
  ret i32 0
}

Okay, so it's likely to be chinese, even though I went ahead and renamed some variables to make it a bit easier to read.

The important bits are these two blocks:

  %.pn13 = select i1 %12, i32 %c, i32 %b          ; <i32> [#uses=1]
  %tmp21 = add i32 %N, -1                         ; <i32> [#uses=1]
  %a.1 = add i32 %.pn13, %tmp21                   ; <i32> [#uses=2]

  %.pn = select i1 %16, i32 %c, i32 %b            ; <i32> [#uses=1]
  %tmp = add i32 %N, -1                           ; <i32> [#uses=1]
  %d.1 = add i32 %.pn, %tmp                       ; <i32> [#uses=1]

Which respectively set a and d.

And the conclusion is: No difference

Note: in a simpler example the two variables actually got merged, it seems here that the optimizer did not detect the similarity...

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
0

Understand that it's entirely up to the compiler how it interprets ternary expression (unless you actually force it not to with (inline) asm). It could just as easily understand ternary expression as 'if..else' in its Internal Representation language, and depending on the target backend, it may choose to generate conditional move instruction (on x86, CMOVcc is such one. There should also be ones for min/max, abs, etc). The main motivation of using conditional move is to transfer the risk of branch mispredict to a memory/register move operation. The caveat to this instruction is that nearly all the time, the operand register that will be conditionally loaded will have to be evaluated down to register form to take advantage of the cmov instruction.

This means that the unconditional evaluation process now has to be unconditional, and this will appear to increase the length of the unconditional path of the program. But understand that branch mispredict is most often resolved as 'flushing' the pipeline, which means that the instructions that would have finished executing are ignored (turned to No Operation instructions). This means that the actual number of instructions executed is higher because of the stalls or NOPs, and the effect scales with the depth of the processor pipeline and the misprediction rate.

This brings an interesting dilemma in determining the right heuristics. First, we know for sure that if the pipeline is too shallow or the branch prediction is fully able to learn pattern from branch history, then cmov is not worth doing. It's also not worth doing if the cost of evaluation of conditional argument is greater on than the cost from misprediction on average.

These are perhaps the core reasons why compilers have difficulty exploiting cmov instruction, since the heuristics determination is largely dependent on the runtime profiling information. It makes more sense to use this on JIT compiler since it can provide runtime instrumentation feedback and build a stronger heuristics for using this ("Is the branch truly unpredictable?"). On static compiler side without training data or profiler, it's most difficult to assume when this will be useful. However, a simple negative heuristic is, as aforementioned, if the compiler knows that the dataset is completely random or forcing cond. to uncond. evaluation is costly (perhaps due to irreducible, costly operations like fp divides), it would make good heuristics not to do this.

Any compiler worth its salt will do all that. Question is, what will it do after all dependable heuristics have been used up...

kchoi
  • 473
  • 1
  • 4
  • 11
0

Any decent compiler should generate the same code for these if optimisation is turned on.

Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
  • 1
    IMO this, too, should be a comment, considering the top 2 answers that are actually answers – Armen Tsirunyan Jul 19 '11 at 21:44
  • you are right, but... one of them although answering the question seriously fails to mention that this is only an issue with `-O0`. compilers do a very good job at optimization and people shouldn't do micro optimizations like these. off. – Karoly Horvath Jul 19 '11 at 21:49