Do modern compilers optimize the x * 2 operation to x << 1?

Question

Does the C++ compiler optimize the multiply by two operation x*2 to a bitshift operation x<<1?

I would love to believe that yes.

score 31 · Accepted Answer · answered Oct 24 '08 at 20:10

31

Actually VS2008 optimizes this to x+x:

01391000  push        ecx  
    int x = 0;

    scanf("%d", &x);
01391001  lea         eax,[esp] 
01391004  push        eax  
01391005  push        offset string "%d" (13920F4h) 
0139100A  mov         dword ptr [esp+8],0 
01391012  call        dword ptr [__imp__scanf (13920A4h)] 

    int y = x * 2;
01391018  mov         ecx,dword ptr [esp+8] 
0139101C  lea         edx,[ecx+ecx]

In an x64 build it is even more explicit and uses:

    int y = x * 2;
000000013FB9101E  mov         edx,dword ptr [x] 

    printf("%d", y);
000000013FB91022  lea         rcx,[string "%d" (13FB921B0h)] 
000000013FB91029  add         edx,edx

This is will the optimization settings on 'Maximize speed' (/O2)

answered Oct 24 '08 at 20:10

Rob Walker

46,588
15
99
136

1

gcc does the same thing (I already saw several times it use lea, and compiling your sample program it used add both with -m32 and -m64). – CesarB Oct 24 '08 at 23:39
A quick way to view this information: put a breakpoint at the line of interest, run in debug mode. When it stops at the line, right click and choose "Go To Disassembly". Other methods here: https://stackoverflow.com/questions/1020498/how-to-view-the-assembly-behind-the-code-using-visual-c – RoG May 08 '17 at 01:11

paercebal · Answer 2 · 2008-10-25T10:13:58.863

24

This article from Raymond Chen could be interesting:

When is x/2 different from x>>1? : http://blogs.msdn.com/oldnewthing/archive/2005/05/27/422551.aspx

Quoting Raymond:

Of course, the compiler is free to recognize this and rewrite your multiplication or shift operation. In fact, it is very likely to do this, because x+x is more easily pairable than a multiplication or shift. Your shift or multiply-by-two is probably going to be rewritten as something closer to an add eax, eax instruction.

[...]

Even if you assume that the shift fills with the sign bit, The result of the shift and the divide are different if x is negative.

(-1) / 2 ≡ 0
(-1) >> 1 ≡ -1

[...]

The moral of the story is to write what you mean. If you want to divide by two, then write "/2", not ">>1".

We can only assume it is wise to tell the compiler what you want, not what you want him to do: The compiler is better than an human is at optimizing small scale code (thanks for Daemin to point this subtle point): If you really want optimization, use a profiler, and study your algorithms' efficiency.

edited Oct 25 '08 at 10:13

answered Oct 24 '08 at 21:16

paercebal

81,378
38
130
159

1

At least optimising the small scale code such as the question asks, the human should still be writing the algotirhm :-) – Dominik Grabiec Oct 25 '08 at 00:49
The implementation of the shift operator in C++ is actually platform specific; some platforms do have an arithmetic shift. However the one on the x86 rounds incorrectly. This means that the compiler can't safely use SAR in that case while a human can (knowing the consequences). – Jasper Bekkers Dec 10 '08 at 12:24
However, that still doesn't mean that SAR is faster in the general case because of instruction pairing mentions in the quote. – Jasper Bekkers Dec 10 '08 at 12:25
1

@JasperBekkers: What do you mean it rounds "incorrectly"? The only time I can think of where I've ever wanted -1/2 to equal 0 rather than 2, I was generating code for a Z80 using a compiler where using >>1 and adjusting was much faster than using a divide operator (and speed mattered). – supercat Apr 24 '15 at 03:38
@supercat That was a mistake on my end. I stand corrected. The Intel Manual actually explicitly states this. "The SAR and SHR instructions can be used to perform signed or unsigned division, respectively, of the destination operand by powers of 2. For example, using the SAR instruction to shift a signed integer 1 bit to the right divides the value by 2." – Jasper Bekkers May 02 '15 at 11:48
Having said that, however, the C++ standard still doesn't guarantee right shifts are arithmetic shifts for signed integers. "If E1 has a signed type and a negative value, the resulting value is implementation-defined" – Jasper Bekkers May 02 '15 at 11:52

C. Dragon 76 · Answer 3 · 2008-10-24T23:04:51.490

VS 2008 optimized mine to x << 1.

    x = x * 2;
004013E7  mov         eax,dword ptr [x] 
004013EA  shl         eax,1 
004013EC  mov         dword ptr [x],eax

EDIT: This was using VS default "Debug" configuration with optimization disabled (/Od). Using any of the optimization switches (/O1, /O2 (VS "Retail"), or /Ox) results in the the add self code Rob posted. Also, just for good measure, I verified x = x << 1 is indeed treated the same way as x = x * 2 by the cl compiler in both /Od and /Ox. So, in summary, cl.exe version 15.00.30729.01 for x86 treats * 2 and << 1 identically and I expect nearly all other recent compilers do the same.

You should list what compile settings you were using – David Frenkel Oct 24 '08 at 21:34 — David Frenkel, Oct 24 '08 at 21:34

score 11 · Answer 4 · answered Oct 24 '08 at 20:03

11

Not if x is a float it won't.

answered Oct 24 '08 at 20:03

David Arno

42,717
16
86
131

Of course, bit shifting a float has an entirely different result than multiplication by powers of two, n'est pas? – wprl Oct 24 '08 at 20:10
For a float multiplication with 2 is just a matter of adding 1 to the exponent. But I don't think compilers optimize that. – rslite Oct 24 '08 at 20:17
1

@rslite, given the efficiency of floating point units in CPUs these days, it is likely that it would be quicker to let the FPU do the multiplication, rather than doing an integer addition to just part of the number's bit set. – David Arno Oct 24 '08 at 20:20
What about negative values? Bit shifting should only work for unsigned variables, right? – Treb Oct 24 '08 at 20:20
@Treb, there are two kinds of bit shifts, arithmetic shifts and logical shifts. The former keeps sign, the latter does not. One would hope the compiler is smart enough to figure out which to use. – rmeador Oct 24 '08 at 20:45

score 5 · Answer 5 · answered Oct 24 '08 at 20:05

5

Yes. They also optimize other similar operations, such as multiplying by non-powers of two that can be rewritten as the sums of some shifts. They will also optimize divisions by powers of 2 into right-shifts, but beware that when working with signed integers, the two operations are different! The compiler has to emit some extra bit twiddling instructions to make sure the results are the same for positive and negative numbers, but it's still faster than doing a division. It also similarly optimizes moduli by powers of 2.

answered Oct 24 '08 at 20:05

Adam Rosenfield

390,455
97
512
589

1

The compiler actually doesn't have to emit any extra instructions for a divide - that's what SAL is for - it handles two's complement numbers properly. – Branan Oct 24 '08 at 20:18
Err yeah (where by SAL you meant SAR); it's for signed moduli by powers of 2 in which it emits extra bit twiddling code – Adam Rosenfield Oct 24 '08 at 20:29

score 5 · Answer 6 · answered Oct 24 '08 at 20:39

The answer is "if it is faster" (or smaller). This depends on the target architecture heavily as well as the register usage model for a given compiler. In general, the answer is "yes, always" as this is a very simple peephole optimization to implement and is usually a decent win.

score 4 · Answer 7 · answered Nov 04 '08 at 06:34

That's only the start of what optimizers can do. To see what your compiler does, look for the switch that causes it to emit assembler source. For the Digital Mars compilers, the output assembler can be examined with the OBJ2ASM tool. If you want to learn how your compiler works, looking at the assembler output can be very illuminating.

Ferruccio · Answer 8 · 2008-10-24T23:53:15.013

I'm sure they all do these kind of optimizations, but I wonder if they are still relevant. Older processors did multiplication by shifting and adding, which could take a number of cycles to complete. Modern processors, on the other hand, have a set of barrel-shifters which can do all the necessary shifts and additions simultaneously in one clock cycle or less. Has anyone actually benchmarked whether these optimizations really help?

score 0 · Answer 9 · answered Oct 24 '08 at 20:05

0

Yes, they will.

answered Oct 24 '08 at 20:05

Andy Lester

91,102
13
100
152

score 0 · Answer 10 · answered Oct 24 '08 at 20:44

0

Unless something is specified in a languages standard you'll never get a guaranteed answer to such a question. When in doubt have your compiler spit out assemble code and check. That's going to be the only way to really know.

answered Oct 24 '08 at 20:44

dwj

3,443
5
35
42

score 0 · Answer 11 · answered Oct 25 '08 at 00:43

@Ferruccio Barletta

That's a good question. I went Googling to try to find the answer.

I couldn't find answers for Intel processors directly, but this page has someone who tried to time things. It shows shifts to be more than twice as fast as ads and multiplies. Bit shifts are so simple (where a multiply could be a shift and an addition) that this makes sense.

So then I Googled AMD, and found an old optimization guide for the Athlon from 2002 that lists that lists the fastest ways to multiply numbers by contants between 2 and 32. Interestingly, it depends on the number. Some are ads, some shifts. It's on page 122.

A guide for the Athlon 64 shows the same thing (page 164 or so). It says multiplies are 3 (in 32-bit) or 4 (in 64-bit) cycle operations, where shifts are 1 and adds are 2.

It seems it is still useful as an optimization.

Ignoring cycle counts though, this kind of method would prevent you from tying up the multiplication execution units (possibly), so if you were doing lots of multiplications in a tight loop where some use constants and some don't the extra scheduling room might be useful.

But that's speculation.

tobi_s · Answer 12 · 2018-10-16T08:44:26.570

Why do you think that's an optimization?

Why not 2*x → x+x? Or maybe the multiplication operation is as fast as the left-shift operation (maybe only if only one bit is set in the multiplicand)? If you never use the result, why not leave it out from the compiled output? If the compiler already loaded 2 to some register, maybe the multiplication instruction will be faster e.g. if we'd have to load the shift count first. Maybe the shift operation is larger, and your inner loop would no longer fit into the prefetch buffer of the CPU thus penalizing performance? Maybe the compiler can prove that the only time you call your function x will have the value 37 and x*2 can be replaced by 74? Maybe you're doing 2*x where x is a loop count (very common, though implicit, when looping over two-byte objects)? Then the compiler can change the loop from

    for(int x = 0; x < count; ++x) ...2*x...

to the equivalent (leaving aside pathologies)

    int count2 = count * 2
    for(int x = 0; x < count2; x += 2) ...x...

which replaces count multiplications with a single multiplication, or it might be able to leverage the lea instruction which combines the multiplication with a memory read.

My point is: there are millions of factors deciding whether replacing x*2 by x<<1 yields a faster binary. An optimizing compiler will try to generate the fastest code for the program it is given, not for an isolated operation. Therefore optimization results for the same code can vary largely depending on the surrounding code, and they may not be trivial at all.

Generally, there are very few benchmarks that show large differences between compilers. It is therefore a fair assumption that compilers are doing a good job because if there were cheap optimizations left, at least one of the compilers would implement them -- and all the others would follow in their next release.

score -9 · Answer 13 · answered Oct 24 '08 at 20:32

-9

It depends on what compiler you have. Visual C++ for example is notoriously poor in optimizing. If you edit your post to say what compiler you are using, it would be easier to answer.

answered Oct 24 '08 at 20:32

T.E.D.

44,016
10
73
134

1

it is? The the first I'm hearing about it. Actually, I usually hear that it's one of the best PC level compilers for optimizations. – James Curran Oct 24 '08 at 20:46
2

Any proof that VC++ is "notoriously poor in optimizing"? For instance, show us the machine code generated from the same source by different compilers. – Nemanja Trifunovic Oct 24 '08 at 20:58
1

@James: http://www.ddj.com/184405641 <- big article. http://www.ddj.com/showArticle.jhtml?documentID=ddj0405a&pgno=12 <- table summarizing the conclusion I am not saying this one article is the standard, but me being in the linux community I hear bad comments about VC++ not often, but enough. – J.J. Oct 24 '08 at 20:58
@James: Notice on that link I posted VC 7 does considerable better than VC 6. – J.J. Oct 24 '08 at 20:59
@J.J. By reading the link, VC7.1 (i.e. VC2003) is quite correctly ranked. The "notoriously poor in optimizing" is a fallacy. It's not the best, perhaps, but it was as good as, if not better the GCC equivalent. – paercebal Oct 24 '08 at 21:08
I ported a specialized DB library to Windows fairly recently and the test script was much slower than it was in Linux compiled with GCC 4.3. I didn't dig into causes, it could be that Windows just sucks doing console in/out streaming. – Zan Lynx Oct 25 '08 at 03:03
1

Response offers nothing useful; the author's opinion about VC++, although it could be true in some cases, is unhelpful. – MarkR Oct 25 '08 at 10:27

Do modern compilers optimize the x * 2 operation to x << 1?

13 Answers13

Linked