Which code will execute faster in C?

Question

I saw the following code (Question 3 on PUZZLERSWORLD.COM site):

Code 1 :

for (i = 0; i < 1000; i++)
    for (j = 0; j < 100; j++)
        x = y;

Code 2 :

for (i = 0; i < 100; i++)
    for (j = 0; j < 1000; j++)
        x = y;

Which code will execute faster?

Options:

a) Code 1 and Code 2 are of same speed,
b) Code 1,
c) Code 2,
d) Can't Say

Answer:

c)

So, I have a question, Why is the second code faster than the first?

How did you check that the second code snippet is faster? It seems you're just (blindly) believing a random website (where most code examples don't even have valid syntax) — UnholySheep, Aug 17 '17 at 09:06
Offtopic: what was your answer for the second question? There is no "compile time error" option? and if I fix the errors, I still miss the infinite loop option. Other questions have similar problems. — mch, Aug 17 '17 at 09:09
I did not mean your question, I meant the question #2 from the link you posted. — mch, Aug 17 '17 at 09:14
Come on guys just ignore the trivial case of the code being optimised down to one line. — alk, Aug 17 '17 at 09:15
add e) Code 2 never slower the Code 1. That is the correct answer — stefan bachert, Aug 17 '17 at 09:19
@stefanbachert not even that, you can't prove it. It's just **very unlikely** to find a compiler emitting faster code for code 1. — , Aug 17 '17 at 09:21
@Felix Palmen: I am sorry, we compare code 1 and code 2. You are just talking about code 1. You missed something — stefan bachert, Aug 17 '17 at 09:26
@stefanbachert no, you miss the fact that C is defined in terms of an abstract machine and a compiler could do anything with these codes. Although you're probably right that no real-world compiler would ever emit something slower for code2, there's no way to **prove** that. — , Aug 17 '17 at 09:27
@ Felix Palmen: I think we could assume that code 1 and code 2 is applied to the same compiler using the same optimisations. When the compiler do or do not optimisation depending on the timer state, you are right — stefan bachert, Aug 17 '17 at 09:32
@stefanbachert the problem is making assumptions about the compiler. In practice, I agree with your conclusion. It can't be justified by the rules of the language, though. — , Aug 17 '17 at 09:42
@Jayesh Just a general bit of advice. Ignore that puzzle page because more than half of the answers on it are incorrect. — Art, Aug 17 '17 at 11:17
The real answer is "it depends". It can be affected by what `i`, `j`, `x` and `y` are. Are they variables, if so what type? Are any `volatile`? what if `y` is a macro rather than the name of a variable? etc etc. Once you get past those questions, more questions come up with answers that depend on the compiler .... some compilers will optimise out loops, different optimisation settings may be used, etc etc. — Peter, Aug 17 '17 at 13:32
Wow! Any site that calls the language "C/C++" ought to be closed down. When it also has [a "correct" answer to `printf("%d %d %d",i,++i,i++);`](https://stackoverflow.com/questions/949433/why-are-these-constructs-using-undefined-behavior), it is just hopeless. Run away, and run fast! — Bo Persson, Aug 17 '17 at 15:00

Grzegorz Szpetkowski · Accepted Answer · 2017-08-17T10:52:21.767

13

Unless x or y or both are declared as volatile, both codes can be reduced into:

x = y;

Example:

int f(int y)
{
    int x;

    for (int i = 0; i < 1000; i++)
        for (int j = 0; j < 100; j++)
            x = y;
    return x;
}

yields:

f(int):
        mov     eax, edi
        ret

edited Aug 17 '17 at 10:52

answered Aug 17 '17 at 09:09

Grzegorz Szpetkowski

36,988
6
90
137

1

I believe this is the correct answer, since this is extremely likely to happen with any real-world compiler. Pondering which snippet that is fastest with no optimizations enabled is meaningless practice. – Lundin Aug 17 '17 at 09:45
The code depends if variables are volatile or not. But the time will be the same in the both cases, with minor cache related differences – 0___________ Aug 17 '17 at 09:54

score 6 · Answer 2 · 2017-08-17T09:19:46.190

The question seems to assume that incrementing will be done e.g. in a register and it's more efficient not to have to switch between incrementing i and j and reinitializing j too often. These assumptions might hold on some hardware, but it's unlikely to make a measurable difference.

The correct answer would be: "a), with any decent compiler", because the code created by a decent compiler won't execute anything other than a single x = y; with this source. (so both snippets will compile to exactly the same executable code)

as for "how to check", you have two possibilities:

compile to assembly code (e.g. with gcc's -S option) and compare (this is a good way with code as simple as this).
measure. Run each code very often (like a million times) and take the time.

I like your reference to measuring. Because that's the essence of finding performance issues and too many people don't get it. — CodeMonkey, Aug 17 '17 at 09:40

score 6 · Answer 3 · answered Aug 17 '17 at 09:10

6

Answer d) is correct. In general you can't say much about performance without referring to all the details of, which compiler is used, which processor architecture is used etc.

I'd stop using that website as a learning resource and find another one.

answered Aug 17 '17 at 09:10

CodeMonkey

4,067
1
31
43

I think one could prove that code 2 will always faster or the same in speed as code 1. – stefan bachert Aug 17 '17 at 09:12
1

@stefanbachert Try compiling this in clang. It both becomes constant time because the loops are completely unnecessary. Other compilers might not optimize this. But you just don't know. – CodeMonkey Aug 17 '17 at 09:13
The real answer is not in the option list. The real answer is: Code 2 will never be slower than Code 1. Code 1 and Code 2 will be compared using the same compiler – stefan bachert Aug 17 '17 at 09:18
I am sorry that your are giving up. However, I would recommend to look at argument not at votes. – stefan bachert Aug 17 '17 at 09:23
1

@stefanbachert I'm not giving up I just don't have that much time to devote to you. I wouldn't make absolute claims like this when it comes to performance, but I can see your revised statement *probably* being correct, there's no way to prove it for all compilers. From the given answers, d) is still the correct answer though. You just can't say. – CodeMonkey Aug 17 '17 at 09:35
@stefanbachert Look at Fabel's counterexample on your answer, that would be an unlikely, but possible way that Code 1 is faster. – CodeMonkey Aug 17 '17 at 09:42
There is no counter example. Assuming `f()` is not the question, it is an other case – stefan bachert Aug 17 '17 at 10:10
2

@stefanbachert: the second code can be much slower than the first when compiled on the DS9K. There is no certainty in optimization: with any decent compiler, answer a) is correct, in general I would chose answer d) but given the website, one is reduced to wild guesses as the definitions of `i`, j`, `x` and `y` and not known, nor the rest of the program, nor the target environment, nor the compiler specifics and configuration... – chqrlie Aug 17 '17 at 10:39
@stefanbachert it is a counterexample, and Lundin gave another one (8bit compiler with no/bad optimizations). I'm not asking you to give up, just asking you to actually think through the counterexamples and see that there is a difference between *proving* something and a *good enough* abstraction. – CodeMonkey Aug 17 '17 at 11:03
On DS9K a program may have finished before it has been started – stefan bachert Aug 17 '17 at 11:07

stefan bachert · Answer 4 · 2017-08-17T10:23:59.250

1

A loop has some overhead

The overhead of the inner loop in code 1 is called 1000 times while in code 2 100 times.

Thatfore code 2 is faster or same

Even if the compiler optimizes very well and removes the loops, code 2 will never be slower

I choose option e)

PROVE

ASSUMPTIONS

using the compiler for both codes
when using optimisations then for both codes
using only rational optimisations, no random or weird ones
any code need time to execute

case a)

the compiler does not optimize at all, than the overhead of inner loop applies and code 2 ist faster. That would match option D) or E)

case b)

the compiler does optimization

b1) remove loops at all.

Than both codes result to the same. Options A) or E)

b2) combining both loops to one

Than both codes result to the same. Options A) or E)

b3) combining reordering the loops (this would be a silly sub-optimization in the original code)

Than both codes result to the same. Options A) or E)

So after combining all cases only E) is true

Fabel "counterexample" is not valid since there is no reason why applying an optimisation in code 1 but not in code 2

edited Aug 17 '17 at 10:23

answered Aug 17 '17 at 09:10

stefan bachert

9,413
4
33
40

Don't understand the downvotes as this is right. The initialiazion, the condition checking and the incrementing are the loops overhead. This would apply if the compiler would **not** optimize at all or the assignment wouldn't be the same all the time. – Andre Kampling Aug 17 '17 at 09:12
@AndreKampling the only difference would be more initializations of `j`. Even with a really "stupid" compiler, this is such a tiny difference, you wouldn't be able to measure it. With different assignments, an optimizing compiler would **still** emit the same code. – Aug 17 '17 at 09:15
@FelixPalmen: Yes that's true also it wouldn't difference in Big-O notation. But I don't think this should be downvoted. – Andre Kampling Aug 17 '17 at 09:16
3

@AndreKampling it's better now, "code 2 will never be slower" is **most probably** correct. Still you can't prove it for **any** compiler. – Aug 17 '17 at 09:17
@FelixPalmen: Yes right, he should mentioned that or the assumptions he made for his answer. – Andre Kampling Aug 17 '17 at 09:18
1

@FelixPalmen is right on target. *Probably* is not what the question asks. – CodeMonkey Aug 17 '17 at 09:19
1

If i and x or y are declared volatile the compiler might have a threshold for unrolling such a loop somewhere between 100 and 1000 making Code 1 much faster. Or if just x or y are volatile and the compiler isn't smart enough to combine the two loops for optimal unrolling. (This is of course stretching it.) – Fabel Aug 17 '17 at 09:33
@Fabel thank you, that's a great counter-example to the *never*! – CodeMonkey Aug 17 '17 at 09:41
2

If you want to make these kind of (incorrect) assumptions about how the compiler does not optimize code, then how about this: A comparison against 1000 is 16 bits and therefore very CPU-intense on a 8 bit CPU, while a comparison against 100 is 8 bits and a single instruction. The first example has 1000 comparisons against 1000, while the second example has 100000 comparisons against 1000. The second example is therefore _extremely_ slow compared to the first one. This is very much true for all PIC, AVR, HC08, 8051 etc etc. – Lundin Aug 17 '17 at 09:53
1

But this is not how the compiler will generate the code, so neither this answer or my example are the slightest relevant (they both make the same erroneous assumption about no optimizations being present), assuming real-world compilers are being used. – Lundin Aug 17 '17 at 09:54
@Lundin Talking about embedded systems... in that kind of scenario the example code could make sense, sort of. If what appears to be variables are actually (confusingly named) macros for accessing hardware registers. Apart from the recklessness of using a hardware register as counter in a loop, it could be reasonable code for copying a pixmap to a LED-sign or something like that. (As it's very common to read/write a stream of data one word at a time from/to a single register.) And where do you even begin making up fictional examples around that scenario (or macros in general for that matter)? – Fabel Aug 17 '17 at 18:37

score 1 · Answer 5 · answered Aug 17 '17 at 09:23

1

Because initializing a loop requires a tiny amount of overhead (setting the counter variables to zero). In Code 1 that overhead is executed 1+1000 times, while in Code 2 just 1+100 times. Using a real compiler thou would at least remove the loops and reduce the code to just x = y; in both cases since it knows the result will be the same. Or if it couldn't do that (if inner loop contained say f(i,j);) it would perhaps decide to unroll the inner loop in Code 1 but not in Code 2 making Code 1 much faster (probably at least double the speed but it depends on the ISA and ABI of course).

answered Aug 17 '17 at 09:23

Fabel

1,711
14
36

Why should a reasonable compiler decide to unroll in code1 but not in code2 ? That is no compiler, that is a poor random generator – stefan bachert Aug 17 '17 at 10:17
If it has a threshold somewhere between 100 and 1000 iterations to consider the inner loop to be worth unrolling. Which would not be an unreasonable threshold, but still just hypothetical of course. (It also assumes the compiler isn't smart enough to combine the two loops and make the same partial unroll in both cases.) – Fabel Aug 17 '17 at 17:28

0___________ · Answer 6 · 2017-08-17T10:23:11.617

Depending on the type of the variables, as compiler will optimise the code different way.

The answer: if any optimisation are on, the code executes exactly the same for foo, if volatile there is a difference as the register is loaded more times

int x, y;
volatile int m,n;


void foo(void)
{
for(int i=0; i<1000; i++)
   for(int j=0; j<100; j++)
      x = y;


for(int i=0; i<100; i++)
   for(int j=0; j<1000; j++)
      x = y;
}

void foo1(void)
{
for(int i=0; i<1000; i++)
   for(int j=0; j<100; j++)
      m = n;


for(int i=0; i<100; i++)
   for(int j=0; j<1000; j++)
      m = n;
}

and the generated code ARM: (the x86 verison here: https://godbolt.org/g/tSRKxy).

foo():
        ldr     r3, .L2
        ldr     r2, [r3, #4]
        str     r2, [r3]
        bx      lr
.L2:
        .word   .LANCHOR0
foo1():
        mov     r0, #1000
        ldr     r3, .L14
.L6:
        mov     r2, #100
.L5:
        ldr     r1, [r3, #8]
        subs    r2, r2, #1
        str     r1, [r3, #12]
        bne     .L5
        subs    r0, r0, #1
        bne     .L6
        mov     r0, #100
.L8:
        mov     r2, #1000
.L7:
        ldr     r1, [r3, #8]
        subs    r2, r2, #1
        str     r1, [r3, #12]
        bne     .L7
        subs    r0, r0, #1
        bne     .L8
        bx      lr
.L14:
        .word   .LANCHOR0
x:
y:
n:
m:

Except, we can't know or assume that there is cache memory present on a "generic computer". — Lundin, Aug 17 '17 at 09:58

chqrlie · Answer 7 · 2017-08-17T14:22:34.503

The question is missing important information for any documented answer.

If for example we have these programs:

int main(void) {
    int i, j, x, y;
    for (i = 0; i < 1000; i++)
        for (j = 0; j < 100; j++)
            x = y;
}

and

int main(void) {
    int i, j, x, y;
    for (i = 0; i < 100; i++)
        for (j = 0; j < 1000; j++)
            x = y;
}

Both have undefined behavior because they read the value of uninitialized variable y.

If we have this:

int main(void) {
    int i;
    unsigned char j;
    int x, y = 0;
    for (i = 0; i < 1000; i++)
        for (j = 0; j < 100; j++)
            x = y;
}

and

int main(void) {
    int i;
    unsigned char j;
    int x, y = 0;
    for (i = 0; i < 1000; i++)
        for (j = 0; j < 100; j++)
            x = y;
}

The second code runs forever and the first finishes instantly.

With decent compilers and correct definitions, the code is almost completely optimized out, so answer a) applies:

int main(void) {
    int i, j, x, y = 0;
    for (i = 0; i < 100; i++)
        for (j = 0; j < 1000; j++)
            x = y;
}

compiled with Godbolt's compiler explorer, produces:

main:
        xorl    %eax, %eax
        ret

The site expects you to chose c) because the initial overhead of the nested loop is repeated 10 times more in the second code fragment than in the first. While there is some logic in this analysis, it might not be the dominant factor and as explained above other considerations must be studied too, and in any case 900 extra instructions would be negligible compared to several hundreds of thousands for the rest of the code, making it hardly measurable.

In performance analysis, humility is a cardinal virtue. Use profiling tools and carefully write thorough benchmarking code to determine if optimization is useful at all. It is a common mistake to write complex code in a attempt at optimizing and get code that produces incorrect results.

Note that the linked page is full of broken questions. For example Q2:

#include
fun(int i)
{
    int j=0;
    while (i & i-1) {
        j++;   
    }
    printf("%d",j);
}

main(){
    f(100);
}

Options: a. 100 b. 99 c. 2 d. 3

The code does not compile, and were it corrected, it has an infinite loop so none of the answers are correct. For the curious, the correct function to compute the number of bits in i is:

void fun(unsigned i) {
    int j = 0;
    while (i) {
        i = i & (i - 1);
        j++;   
    }
    printf("%d\n", j);
}

Out of 10 questions, 7 have incorrect answers and the remaining 3 (Q1, Q4 and Q5) are trick questions with syntax errors. A very poor quality Q&A site indeed.

Which code will execute faster in C?

7 Answers7

PROVE

ASSUMPTIONS

case a)

case b)