-3

In C, is indexing an array faster than the ?: operator?

For example, would (const int[]){8, 14}[N > 10] be faster than N > 10? 14 : 8?

  • 3
    Did you compile it and look at what code got generated? I don't think there's any way to answer your question generally. – Carl Norum Feb 08 '21 at 16:41
  • 3
    Why don't you measure it or check the generated code? – Bodo Feb 08 '21 at 16:41
  • 3
    Using a ternary operator instead of an `if`/`else` is bad enough, but `(const int[]){8, 14}[N > 10]` is only suitable for the [IOCCC](https://www.ioccc.org/). – Andrew Henle Feb 08 '21 at 16:44
  • Which is better, Ford or Chevy? -- In other words, this sort of question is almost completely unanswerable. If you care, you'll have to measure, on your machine with your compiler. Often, the difference is so slight, you'll have to make millions of trials before you get a result that's statistically significant. Often, the difference will be so slight that it's meaningless. Finally, any difference often won't matter because the actual performance of your program is dominated by other concerns. This kind of microoptimization rarely matters. (Although, it's true, sometimes it does.) – Steve Summit Feb 08 '21 at 16:55
  • 2
    What, no `8 + 6*(N>10)`? Or `fma(6, N>10, 8)`? – Eric Postpischil Feb 08 '21 at 16:58
  • 1
    CPUs and compilers keep getting more and more optimized at using subexpressions like `N > 10` to *make decisions*, which is of course what they're used for the vast majority of the time. Turning the result of `N > 10` back into an actual 0 or 1 value (which, yes, is what the C abstract machine says it has) might require the compiler to emit some more cumbersome, roundabout machine code. Bottom line: Don't get too clever with this sort of thing, that's the compiler's job. Your job as a programmer is to express what you want the code to do, clearly and concisely. – Steve Summit Feb 08 '21 at 17:01
  • 1
    This is micro optimization. It depends on the situation. But I think most of the times indexing is clearly faster (but please use a `static const` array). Branching can be very costly if the CPU mispredicted the condition, _because it then has to flush its pipeline_. Look at this very interesting answer: https://stackoverflow.com/a/11227902 – prapin Feb 08 '21 at 19:53

2 Answers2

5

Stick with the ternary operator:

  • It is simpler
  • It is fewer characters to type
  • It is easier to read and understand
  • It is more maintainable
  • It is likely not the main bottleneck in your application
  • For the CPU it is a simple comparison
  • Compilers are clever, if the array solution was faster, compilers would already generate the same code for both variants

Mandatory quote (emphasis mine):

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%

— Donald Knuth • https://wiki.c2.com/?PrematureOptimization


Now that's out of the way, let's compare what compilers actually produce.

#include <stdlib.h>
int ternary(int n) { return n > 10 ? 14 : 8; }
int array(int n) { return (const int[]){8, 14}[n > 10]; }

Compile with (g)cc 10.2.1 in Ubuntu and optimizations enabled:

$ cc -O3 -S -fno-stack-protector -fno-asynchronous-unwind-tables ternary.c

-S stops after compilation and does not assemble. You will end up with a .s file which contains the generated assembly code. (the -fno… flags are to disable additional code generation which is not required for our example).

ternary.s assembly code, lines unrelated to the methods removed:

ternary:
    endbr64
    cmpl    $10, %edi
    movl    $8, %edx
    movl    $14, %eax
    cmovle  %edx, %eax
    ret
array:
    endbr64
    movq    .LC0(%rip), %rax
    movq    %rax, -8(%rsp)
    xorl    %eax, %eax
    cmpl    $10, %edi
    setg    %al
    movl    -8(%rsp,%rax,4), %eax
    ret
.LC0:
    .long   8
    .long   14

If you compare them, you will notice a lot more instructions for the array version: 6 instructions vs. 4 instructions. There is no reason to write the more complicated code which every developer has to read twice; the shorter and straight-forward code compiles to more efficient machine code.

knittl
  • 246,190
  • 53
  • 318
  • 364
2

Use of the compound literal (and array in general) will be much less efficient as arrays are created (by current real-world compilers) despite the optimization level. Worse, they're created on the stack, not just indexing static constant data (which would still be slower, at least higher latency, than an ALU select operation like x86 cmov or AArch64 csel which most modern ISAs have).

I have tested it using all compilers I use (including Keil and IAR) and some I don't use (icc and clang).

int foo(int N)
{
    return (const int[]){8, 14}[N > 10]; 
}

int bar(int N)
{
    return  N > 10? 14 : 8;
}
foo:
        mov     rax, QWORD PTR .LC0[rip]     # load 8 bytes from .rodata
        mov     QWORD PTR [rsp-8], rax       # store both elements to the stack
        xor     eax, eax                      # prepare a zeroed reg for setcc
        cmp     edi, 10
        setg    al                            # materialize N>10 as a 0/1 integer
        mov     eax, DWORD PTR [rsp-8+rax*4]  # index the array with it
        ret

bar:
        cmp     edi, 10
        mov     edx, 8               # set up registers with both constants
        mov     eax, 14
        cmovle  eax, edx             # ALU select operation on FLAGS from CMP
        ret
.LC0:
        .long   8
        .long   14

https://godbolt.org/z/qK65Gv

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0___________
  • 60,014
  • 4
  • 34
  • 74
  • 1
    By changing the array from a compound to a `static const int[]`, the 2 `mov`s are gone –  Feb 08 '21 at 17:11
  • 2
    The standard requires no such thing because whether the compound literal is actually created in memory is not observable behavior. – Eric Postpischil Feb 08 '21 at 17:16
  • 1
    You claim that the compound literal "must be created". There is no such requirement in the standard, as Eric explained. You cannot deduce the contents of the standard from the behaviour of the compilers. The correct wording is that "current compilers do not seem to realize this". – Antti Haapala -- Слава Україні Feb 08 '21 at 17:23
  • @AnttiHaapala we live in the real world and use real compilers. BTW similar behaviour I observe using "normal" arrays. I do not have time to study the standard, but there must be something if compilers supported by godbolt behave the same way. – 0___________ Feb 08 '21 at 17:33
  • 1
    “I do not have time to study the standard” but you have time to write answers claiming to say what the standard requires? The fact that a compiler only needs to generate code producing observable behavior is fundamental to understanding optimization. It does not take much study; the compiler is only required to generate code that matches the abstract machine in accesses to volatile objects, data written to files by the end of program execution, and input and output dynamics of interactive devices (and “Communication with the environment,” C 2018 7.22.4). Compound literals are not among that. – Eric Postpischil Feb 08 '21 at 17:40
  • @EricPostpischil OK. Your virtual machine using virual compiler and virtual optimizations may generate the same code. My real ones do not. What is more important for someone who tries to find what is more efficient today? – 0___________ Feb 08 '21 at 17:45
  • 2
    @0___________ Practical, real-world empiricism is fine. Excessive reliance on the Standard can indeed turn into an exercise in navel-gazing. But where you get into trouble (and, I suspect, invite downvotes) is when you say "*will* be much less efficient as it *has* to be created despite the optimization level" (emphasis mine). Your statement would have been equally useful to the OP, without drawing the ire of the hawkeyed, if you had said "tends to be less efficient as compilers typically create it despite the optimization level". – Steve Summit Feb 08 '21 at 18:45
  • 1
    I edited your answer to point out that it's not just "creating" the array (copying to the stack I think you mean) that's problematic, it's using a load at all. And the possibility of the cache miss on the static constant it's copying from. These are the 2 real options in asm, so it makes some sense to actually talk about how they perform, not just argue from the assumption that the array is slower in asm without saying anything about that fact. Anyway, I think my edit fixes that: CPUs have efficient "select" instructions. – Peter Cordes Feb 08 '21 at 20:58