7

The full code is here: http://pastebin.com/MM3vWmqA

In the function fast_generator I've added comments to two statements. If you switch those statements, the code will run ~1.8x faster. If you remove the first statement the code will perform faster than the original version, but slower compared to if they were switched.

The test cases should be the following.

First - slowest. 452ms.

counter++;
i--;

Second - faster than the first one. 280ms.

i--;
counter++;

Third - faster than the first, but slower than the second one. 421ms.

i--;

The assembler output for the original statements is.

inc edx
mov eax, 6

I have verified that when switching those statements the assembler output stays the same, with the only difference of these asm instructions being interchanged.

I've tested it with VC++10 and VC++11, same behaviour. Can someone explain why switching these statements speeds up the algorithm ~1.8x? If you think that std::clock() is not accurate, change size = 7. On my machine the difference with size = 7 is 12000ms vs 7000ms.

NFRCR
  • 5,304
  • 6
  • 32
  • 37
  • 6
    List your compile options or it didn't happen. (applies to any discussion of performance) – Ben Voigt Apr 04 '12 at 21:29
  • 1
    Not related to your question, but change `int main(void)` to `int main()` in C++. – Jesse Good Apr 04 '12 at 21:30
  • In this case, a description of the platform, including processor model, RAM speed and capacity, etc. might also help. – Ben Voigt Apr 04 '12 at 21:30
  • I'm not able to repro this on either x86 or x64 with VS2010. The timings are the same. What processor are you using? – Mysticial Apr 04 '12 at 21:30
  • @Mysticial: VS2010 ships with C++ compiler version 16.00 – Ben Voigt Apr 04 '12 at 21:32
  • @BenVoigt I don't think it matters, I seem to getting the same assembly output: with the two instructions switched. – Mysticial Apr 04 '12 at 21:33
  • Jesse, (void) is perfectly fine, it's a matter of style. I'm compiling it as x86. Processor: Intel E8400 @ 4.0GHz. RAM: 4GB @ 900MHz. Compiler options: /analyze- /O2 /Gd /nologo /MT /Gm- /GL /Fa"Release\" /Oi /Oy- /Zc:forScope /Fo"Release\" /Gy /Fp"Release\C++_test.pch" /WX- /errorReport:queue /GS /Fd"Release\vc110.pdb" /fp:precise /W4 /Zi /Zc:wchar_t /EHsc – NFRCR Apr 04 '12 at 21:34
  • Oh ok, that's a Core 2 generation processor. I was testing on a Nehalem. – Mysticial Apr 04 '12 at 21:35
  • Same CPU loads during all executions? If you are on linux, use time instead – BЈовић Apr 04 '12 at 21:37
  • I'm on Windows, and I've executed each of the 3 variants tens of times. – NFRCR Apr 04 '12 at 21:38
  • FWIW, on Ubuntu 10.04, g++ 4.4.3, -O4 -g -Wall -Werror -ansi -pedantic, I get identical results, regardless of the order of the source code lines. – Robᵩ Apr 04 '12 at 21:39
  • 2
    OK - I am able to reproduce this on a Core 2 machine. +1 – Mysticial Apr 04 '12 at 21:52
  • @NFRCR: In C, its perfectly fine, but in C++ it is considered poor style ([here is a reference](http://www.parashift.com/c++-faq-lite/newbie.html#faq-29.4)). And, also here is an [SO question](http://stackoverflow.com/questions/7412274/why-add-void-to-method-parameter-list) talking about it. – Jesse Good Apr 04 '12 at 22:37
  • There's something even funnier going on than switching the instructions. When I compiled case 2 with different offsets, the timings varied wildly by up the a factor of two. It looks like instruction alignment has an effect. – Mysticial Apr 04 '12 at 22:49
  • 1
    @Jesse: Are you going to "correct" his brace style too? That has about as much merit as your complaint. And don't tell me that it's poor style just because it's redundant, plenty of redundancies in C++ are considered good style -- explicitly stating `private` in class definitions, explicitly stating `virtual` for overrides, etc. – Josh Apr 05 '12 at 02:12
  • @Jesse: I mentioned braces as a reminder that people can disagree on *style* without either of them being *wrong*. You simply told him to use *your* preferred style instead of his, and that is wholly inappropriate. That he might be in the minority in his style preference is irrelevant. – Josh Apr 05 '12 at 11:17
  • Jesse, I know that C++ doesn't require you to specify void if the function takes no arguments. However I just happen to prefer specifying that. I think specifying something explicitly will not hurt. For example it's perfectly acceptable if someone decides to write signed long int, even though just long would have been good enough. At times I have to compatible with C code, and it's not worth making your code inconsistent by using () for pure C++ and (void) for the rest. – NFRCR Apr 05 '12 at 18:46
  • @NFRC: Fine, if you want to disagree with the creator of C++ and the entire C++ community, go ahead (fun fact: Did you know that the creators of C++ wanted to disallow `f(void)`? The only reason its allowed is to be backwards compatible with C.) – Jesse Good Apr 05 '12 at 20:52
  • Time to either take this to chat, or get back on the question topic. – Robert Harvey Apr 05 '12 at 21:03

1 Answers1

3

Your slow examples are decrementing i immediately before using it to index the array at the beginning of the loop. Your fast example adds an intervening step. Without being privy to the internal architecture of the processor it's impossible to know for sure, but most likely what is happening is the processor already has buffer[i] in its pipeline, but the decrement invalidates that value, causing a pipeline stall. With the intervening step it has more time to recover the correct value before it is needed.

By the way, mov eax, 5 isn't the instruction that's doing the i--. It would be helpful to post more of the assembly context, for those of us without your compiler.

Karl Bielefeldt
  • 47,314
  • 10
  • 60
  • 94
  • Actually, the `mov eax, 5` is correct. If you look at the code, `i` is always 7 upon entering the loop. So it equates to `mov eax, 6` The OP shows 5, but probably he just forgot to change the constant. – Mysticial Apr 04 '12 at 22:17
  • This doesn't explain the behavior of the 3rd case. The `counter++` seems to be *increasing* the speed. – Mysticial Apr 04 '12 at 22:20
  • mov eax, 5 is indeed for i--, but if size = 6. I've edited the original post so it says mov eax, 6 now. – NFRCR Apr 04 '12 at 22:26