Which operator will run faster among << and * in loop?

Question

I am learning c/c++ code optimization. Writing a simple for loop program, which part of for loop will faster than other w.r.t << and * operator. Code snippet below.

#include <iostream>
#define MAX 1000
int main()
{
int i;

for(i=1; i<= MAX; i= i<<1)
    {
    std::cout<<i <<" ";
    }   

for(i=1; i<= MAX; i= i*2)
    {
    std::cout<<i <<" ";
    }

return 0;
}

Did you try to measure it? Likely they'll compile to the same machine instructions. — TartanLlama, Jan 28 '16 at 08:07
Operators usually don't affect time complexity. However, algorithm you use does. — Aleksandar Makragić, Jan 28 '16 at 08:07
@V.Kravchenko I said usually. If you have O(n^2) complexity last thing you need to worry about is "how fast operator runs". — Aleksandar Makragić, Jan 28 '16 at 08:11
Possible duplicate of [Is shifting bits faster than multiplying and dividing in Java? .NET?](http://stackoverflow.com/questions/1168451/is-shifting-bits-faster-than-multiplying-and-dividing-in-java-net) — jblixr, Jan 28 '16 at 08:12
I hope you are not using above program as a benchmark. Bitshift/multiplication is insignificant compared `cout`, so you won't get meaningful results. — user694733, Jan 28 '16 at 08:19
@jblixr That is not a duplicate. SO uses a tag system. Java is not C++ and C# is not C. — 2501, Jan 28 '16 at 08:21
@2501 The type question he asked is applicable to all languages because it is based on compiler. if `<<` is faster in c++ than `*` in c++ then obviously `<<` in java is faster than `*` in java — jblixr, Jan 28 '16 at 08:28
Your example is pointless because `std::cout` will take far more time than a multiplication or a shift. — Jabberwocky, Jan 28 '16 at 08:38
@jblixr You are wrong. In C++, for signed integers, *2 can sometimes be faster than <<1 because using it is undefined behavior in some cases where the other is well-defined. This does not apply to java at all. — Marc Glisse, Jan 28 '16 at 08:39
These are really bad test cases. The loops are going to run 10 times each. You'll spend more time loading them into cache than executing them, and the loops are completely I/O bound anyway. The performance of `<<` vs. `*` won't make any difference in these examples. Why don't you try creating loops that execute on the order of a million times, with at least 10% of the time being spent on your `<<` or `*` operations? — Tom Karzes, Jan 28 '16 at 08:49

RobClucas · Answer 1 · 2016-01-28T08:59:09.140

First, here is the assembly of using the multiplication operator (I added the comments):

.LC0:
        .string " "
main:
        pushq   %rbp
        movl    $10, %ebp          // Part of loop
        pushq   %rbx
        movl    $1, %ebx           // Part of loop
        subq    $8, %rsp
.L2:
        movl    %ebx, %esi
        movl    std::cout, %edi
        addl    %ebx, %ebx         // Part of loop
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        movl    $1, %edx
        movl    $.LC0, %esi
        movq    %rax, %rdi
        call    std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
        subl    $1, %ebp           // Part of loop
        jne     .L2                // Part of loop
        addq    $8, %rsp 
        xorl    %eax, %eax
        popq    %rbx
        popq    %rbp
        ret
        subq    $8, %rsp
        movl    std::__ioinit, %edi
        call    std::ios_base::Init::Init()
        movl    $__dso_handle, %edx
        movl    std::__ioinit, %esi
        movl    std::ios_base::Init::~Init(), %edi
        addq    $8, %rsp
        jmp     __cxa_atexit

And here is the assembly using the shift operator:

.LC0:
        .string " "
main:
        pushq   %rbp
        movl    $10, %ebp
        pushq   %rbx
        movl    $1, %ebx
        subq    $8, %rsp
.L2:
        movl    %ebx, %esi
        movl    std::cout, %edi
        addl    %ebx, %ebx
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        movl    $1, %edx
        movl    $.LC0, %esi
        movq    %rax, %rdi
        call    std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
        subl    $1, %ebp
        jne     .L2
        addq    $8, %rsp
        xorl    %eax, %eax
        popq    %rbx
        popq    %rbp
        ret
        subq    $8, %rsp
        movl    std::__ioinit, %edi
        call    std::ios_base::Init::Init()
        movl    $__dso_handle, %edx
        movl    std::__ioinit, %esi
        movl    std::ios_base::Init::~Init(), %edi
        addq    $8, %rsp
        jmp     __cxa_atexit

Which is exactly the same (I only compiled with GCC 5.3 for x86 using -O3, so this may not be the case with other compilers and other architectures).

After looking at the link posted in the comments by Chris, the automatic highlighting makes it seems as if the shift operator requires additional instructions:

subl    $1, %ebp
jne     .L2
movl    $10, %ebp    // Actually not part of loop
movl    $1, %ebx     // Actually not part of loop

Compared to the multiplication, which is:

subl    $1, %ebp
jne     .L3

Which is different to the original assembly I posted (since both were exactly the same).

As mentioned in the comments by Revolver_Ocelot, the instructions

movl    $10, %ebp    
movl    $1, %ebx

Are just setting the counters for the next loop, so the loop components actually compile to same assembly (for this compiler and architecture), as was shown by the separate comparison.

Update: Based on Flikk's comment saying the assembly will be different, here is an unoptimised version, which shows that they are the same.

You forgot to mark `movl $10, %ebp` as part of the loop. It is setting loop counter, so it is important. — Revolver_Ocelot, Jan 28 '16 at 08:27
I expected this to happen with optimization. Can't test myself right now, but i think without the compiler omptimization it would look different. — Flikk, Jan 28 '16 at 08:34
@Flikk, I just tried it. Actually it produced the same assembly for both cases. — RobClucas, Jan 28 '16 at 08:37
"_is seems as if the shift operator requires additional instructions_" Two `mov` instructions were setting registers for next loop. Do not believe automatic highlighting, check for yourself. — Revolver_Ocelot, Jan 28 '16 at 08:41
This is the assembly for one specific architecture, and the timing will be for a specific implementation of that architecture (did you even specify it?). C runs on many, many different architectures, so I'd say this answer isn't very general at all, even if it happens to match what OP cares about (does it?) — Tom Karzes, Jan 28 '16 at 08:44
@Revolver_Ocelot thanks, if you do not understand assembly then you will not be able to evaluate the accuracy of the highlighting. I was trying to make it simple for the OP to realise that they compiled to the same thing using GCC. — RobClucas, Jan 28 '16 at 08:55
@TomKarzes, yes it is for one specific architecture. To post an answer that covers all compilers and all architectures would be extremely long. I have updated the answer to state that this is specific to the compiler and architecture I tested with. I think that it can at least give the OP some insight as to what happens in both cases, even if it doesn't cover every possible architecture. — RobClucas, Jan 28 '16 at 08:58

score 1 · Answer 2 · answered Jan 28 '16 at 13:32

You are trying micro-optimisations. When I started out the saying was that beginners save microseconds, real programmers save milliseconds. Nowadays beginners save nanoseconds, real programmers save microseconds :-)

You are outputting i to std::cout. At that point, how i has been calculated is so totally irrelevant because the output takes a few thousand times longer.

Now let's look at what the program does. At some point you get an overflow, which gives you undefined behaviour, which means anything can happen. Correct programs are thousand times more important than fast ones. So any attempt at optimisation is absolutely pointless when your code is already broken.

Now let's look at what you are doing. Ask yourself: Does the compiler figure out what you are doing? The compiler will obviously see that each variant multiplies i by 2. The compiler will then use the fastest way to multiply i by 2, whatever method that is. Probably adding i to itself :-) So what you are doing is again pointless; the compiler is more clever than that.

Back to micro-optimisations: They are utterly ineffective, because the compiler is ten times better at micro-optimisations than you are. Real programmers don't try to make an operation a tiny weeny bit faster; they develop data structures and algorithms that take a lot fewer operations to begin with! They measure first to find out what takes time and optimise where it counts instead of trying to optimise random bits. And they measure afterwards to see if any attempt at optimisation actually worked.

pranjuldb · Answer 3 · 2016-01-29T09:09:53.747

I can give you a simple idea which will help you in judging which is faster between i *= 2 and i = i<<1.

Update MAX = 10¹⁸ and use long long int as data type. Now use time() function to measure the running time of both the for loops independently. And just find out the difference in running time.

Note : Since there will be at max 60 iterations so it might be possible that the difference in the running time would not be comparable so in this case you can use BigInteger library.

Result : Bit operations are faster.

Which operator will run faster among << and * in loop?

3 Answers3