0

I am wondering if I may optimize something if I change the "huge" numbers by some offsets (or something similar) in the switch-case statements. So I have done a test :

#include <iostream>
#include <iomanip>
#include <chrono>

int main() {

    uint32_t f = 0x12345688;
    std::chrono::time_point<std::chrono::system_clock> start, end;

    int i = -1;

    start = std::chrono::system_clock::now();
    switch (f)
    {
        case 0x1234500 : i = 0; break;
        case 0x1234522 : i = 2; break;
        case 0x1234555 : i = 5; break;
        case 0x1234588 : i = 8; break;
        default : break;
    }
    end = std::chrono::system_clock::now();
    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

    int j = -1;
    start = std::chrono::system_clock::now();
    switch (f & 0xf)
    {
        case (0x1234500 & 0xf) : j = 0; break;
        case (0x1234522 & 0xf) : j = 2; break;
        case (0x1234555 & 0xf) : j = 5; break;
        case (0x1234588 & 0xf) : j = 8; break;
        default : break;
    }
    end = std::chrono::system_clock::now();
    elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

    return 0;
}

and it seems that always the second switch-case is faster.

Is there a reason why the small numbers in the cases is making the statement faster (the "gaps" between the cases are the same)?

sop
  • 3,445
  • 8
  • 41
  • 84
  • 12
    I would imagine this is not something you can test reliably with one iteration. – Rotem Dec 08 '16 at 09:13
  • 2
    @Rotem Especially given that writing to a stream with a flush is a very time-consuming operation, which is MUCH slower than any possible switch – alexeykuzmin0 Dec 08 '16 at 09:14
  • Possible duplicate of http://stackoverflow.com/questions/6805026/is-switch-faster-than-if – Sanjay-sopho Dec 08 '16 at 09:30
  • @Rotem : I ran the example more times, and always the second one was faster – sop Dec 08 '16 at 09:31
  • 2
    Have you considered that your compiler has probably optimized the switch statements out completely? – paddy Dec 08 '16 at 09:32
  • 1
    If you could use such easy tricks to make a basic building block of the language faster, then why should the compiler not do it always behind the scenes? – Christian Hackl Dec 08 '16 at 09:33
  • @alexeykuzmin0 : even If I remove the streams I get the second one faster with at least 1e-07 seconds – sop Dec 08 '16 at 09:33
  • Probably because `i` didn't need to be set the second time around. Unless I'm reading your intention wrong. Are you sure you pasted that value for `f` correctly? The first switch statement doesn't match anything. The second will match. – paddy Dec 08 '16 at 09:34
  • I used `j` and the second one is still faster... – sop Dec 08 '16 at 09:35
  • 4
    To get anything meaningful out of this, just disassemble the optimized code and see what you got. – Lundin Dec 08 '16 at 09:36
  • When I try, I get "elapsed time: 0s" in both cases. Visual C++ 2015, `/EHsc /Za /W4`. It's a 5-year-old notebook, Windows 7 64-bit, 1.9Ghz, 4GB RAM. – Christian Hackl Dec 08 '16 at 09:37
  • Where is `i` declared anyway? Is it volatile? – Lundin Dec 08 '16 at 09:37
  • Come to think of it, why should you get anything more than 0 **seconds** in both cases on a reasonably normal computer anyway? It's just a single switch statement each... I think you should post your real code (this one's also missing the declaration of `i`). – Christian Hackl Dec 08 '16 at 09:39
  • I tried it [here](https://ideone.com/) – sop Dec 08 '16 at 09:43
  • I made a test by putting both parts in a loop of 1 billion times, got exactly the same result for both. – A.S.H Dec 08 '16 at 09:44
  • @sop - The compiler is a lot smarter than you give it credit for. All kinds of little tweaks you can do, the compiler will do as well. And then some. It can do really heavy transformations on the resulting machine code, like in [my favorite example](http://stackoverflow.com/a/11639305/597607), were 10 lines of C++ code results in 4-5 machine instructions. – Bo Persson Dec 08 '16 at 10:17
  • 1
    @A.S.H Because no compiler worth their salt will loop over code that obviously doesn’t change its answer. – Konrad Rudolph Dec 08 '16 at 10:46
  • @KonradRudolph I agree, just wanted to point out that in any case, it **is** necessary to test on a *large sample* to draw meaningful statistical conclusions. *Statistics 101*. – A.S.H Dec 08 '16 at 10:49

2 Answers2

4

This benchmarking method is nonsense, because the compiler can statically determine the value of i between the two cases. Your actual code will probably end up something like this:

start = std::chrono::system_clock::now();
switch (f)
{
    case 0x1234500 : i = 0; break;
    case 0x1234522 : i = 2; break;
    case 0x1234555 : i = 5; break;
    case 0x1234588 : i = 8; break;
    default : break;
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

// i = -1;  this line isn't needed, the value isn't used, lets optimize it away

start = std::chrono::system_clock::now();
// Oh great, we already know the value of i
// Because nothing in the previos code could have affected f
// i will get the same value as it did above, and we removed i = -1
// So lets optimize away this pointless code here
end = std::chrono::system_clock::now();
elapsed_seconds = end-start;

You could attempt to declare i as volatile but that may prevent the compiler from other doing optimizations too, so it might not make a valid benchmark test either.

The i = -1; between tests is meaningless, since the compiler can deduct that this value isn't used. So that code will just get removed as well.

The best way might be to read the constant 0x12345688 from a file or user input, so that the compiler can't assume anything about it. You need to do this twice, for both test cases.

Generally: when doing benchmarking like this, always disassemble the code, to verify that your test isn't nonsense.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • the question is "is it faster if the numbers are smaller ?" so, I do not understand the idea of removing i. I can use another int j = -1; and change its value depending on the f, so ... I'll update the code once more – sop Dec 08 '16 at 09:59
  • @sop Yes, I understand the question. However, since you are using fixed constants as inputs, the compiler will optimize the code and your tests will not be meaningful. As described above. And no, changing the variable name won't affect that. The problem is the _input_, the variable `f`. – Lundin Dec 08 '16 at 10:04
  • so, if I change the value of f, will it be better? – sop Dec 08 '16 at 10:13
  • @sop Just read the answer. You need to give `f` a value in run-time. For example by reading it from a file. – Lundin Dec 08 '16 at 10:16
  • 2
    Also, tossing the original code into a loop that runs a billion times won't achieve anything, because the loop is just dealing with compile-time constants. It would get optimized away. – Lundin Dec 08 '16 at 10:17
  • This is actually really tricky to benchmark (even if correcting all the obvious mistakes) for the reasons listed in the answer and the comments. The only way to get an authoritative answer is therefore probably to inspect the machine code. – Konrad Rudolph Dec 08 '16 at 10:45
-1

The most important thing for performance is that the numbers are consecutive. This is because the compiler can use the value to index the memory address which specifies where the switch should jump to handle the case in question. Big gaps makes the indexing impossible, since it would jump very far and the compiler would have to insert huge chunks of empty code to fill unused space.

Small numbers can make it marginally faster - you may only save one instruction because the compiler may subtract and then be able to use indexing still.

All of the above is compiler-dependent. A stupid compiler may not use indexing at all.

Sven Nilsson
  • 1,861
  • 10
  • 11
  • 3
    This answer is pure speculation. It’s *plausible* but it’s not backed up by any evidence whatsoever. Lundin’s answer shows why the question, the way it’s currently asked, doesn’t admit a simple answer. And the fact that OP chose to mark this answer accepted shows that they haven’t been reading to a word written there. – Konrad Rudolph Dec 08 '16 at 10:41
  • I disagree, Lundin's answer fails to address the actual question. The OP did not ask whether his benchmarking method was correct or not. – Sven Nilsson Dec 08 '16 at 13:15
  • 1
    The question was "Is there a reason why the small numbers in the cases is making the statement faster". But there is no proof that it does, incorrect benchmarking tricked the OP. Your answer is correct for the general case, but there is no difference in consecutiveness between the two cases in the question, so there is no reason to believe that they would give different performance. – Lundin Dec 08 '16 at 14:08