Shifting bits left and right is apparently faster than multiplication and division operations on most, maybe even all, CPUs if you happen to be using a power of 2. However, it can reduce the clarity of code for some readers and some algorithms. Is bit-shifting really necessary for performance, or can I expect the compiler or VM to notice the case and optimize it (in particular, when the power-of-2 is a literal)? I am mainly interested in the Java and .NET behavior but welcome insights into other language implementations as well.
-
19Why ask us? Try it, then you'll know! Write the code both ways, get out a stopwatch, and see what happens. – Eric Lippert Jul 22 '09 at 22:26
-
14And the answers will be invalid as soon as a new version of the jvm or clr is released. Or when someone tries it on an embedded system. Or a mobile CPU. Or a GPU. Or when the moon is full. Or when the planets align. Or... (you get the idea.) – Greg D Jul 23 '09 at 05:35
-
4@Eric why do more work than necessary? It takes 1 minute to ask the question on SO, go get a sandwich, and come back to find the answer *and* a detailed explanation of why it is the way it is. – Rex M Aug 10 '09 at 15:41
-
2Our opinions, will be out of date faster than those from an experiment. I.E. already out of date or with the next cpu vm etc. So just write the code as clear as possible so the optimiser can understand it, and thus optimise it. – ctrl-alt-delor May 21 '12 at 11:49
-
2Despite the answers below who all swear modern compilers are way too smart to miss this sort of thing, my own tests on C# 4.0 show that that for an `int i`, the statement `(i << 3)` is 15% to 25% faster than equivalent `(i * 8)` in Release mode. This is consistent across various conditional execution flows, so it does not appear to be an optimization fluke. One should definitely test for themselves in performance-critical loops. The JVMs are more optimized than the .NET CLR, so it wouldn't surprise me if this particular inefficiency is .NET-only. – Special Sauce Nov 27 '15 at 08:44
12 Answers
Almost any environment worth its salt will optimize this away for you. And if it doesn't, you've got bigger fish to fry. Seriously, do not waste one more second thinking about this. You will know when you have performance problems. And after you run a profiler, you will know what is causing it, and it should be fairly clear how to fix it.
You will never hear anyone say "my application was too slow, then I started randomly replacing x * 2
with x << 1
and everything was fixed!" Performance problems are generally solved by finding a way to do an order of magnitude less work, not by finding a way to do the same work 1% faster.

- 21,377
- 10
- 38
- 33
-
17
-
6Use whichever is clearer. If it's a constant factor in multiplication or divide, the compiler will almost certainly do the right thing. – Jason C Jul 22 '09 at 22:41
-
4really good answer - premature optimization is almost always a waste of time. – David Thielen May 22 '12 at 23:53
-
4The behavior of `x>>1` and `x/2` are different if x may be an odd negative number. Unless both operations would be correct (e.g. because x will never be an odd negative number, or code elsewhere will deal with an off-by-one condition) the question of which is faster is irrelevant. I've more often seen `x/2` used when `x>>=1` would have been correct than vice versa [look for programs whose graphics has an ugly seam at the origin X and Y coordinates]. – supercat Nov 01 '13 at 20:16
-
Normally I don't appreciate it when people add their own opinion to an answer but this is just pure truth. Maybe bit shifting may have been a worthwhile performance optimization in the 90s but now it is just silly to justify bit shifting as a "performance hack" with the complexity and power of modern compilers. – Kaiser Keister Apr 04 '19 at 02:24
Most compilers today will do more than convert multiply or divide by a power-of-two to shift operations. When optimizing, many compilers can optimize a multiply or divide with a compile time constant even if it's not a power of 2. Often a multiply or divide can be decomposed to a series of shifts and adds, and if that series of operations will be faster than the multiply or divide, the compiler will use it.
For division by a constant, the compiler can often convert the operation to a multiply by a 'magic number' followed by a shift. This can be a major clock-cycle saver since multiplication is often much faster than a division operation.
Henry Warren's book, Hacker's Delight, has a wealth of information on this topic, which is also covered quite well on the companion website:
See also a discussion (with a link or two ) in:
Anyway, all this boils down to allowing the compiler to take care of the tedious details of micro-optimizations. It's been years since doing your own shifts outsmarted the compiler.

- 30,738
- 21
- 105
- 131

- 333,147
- 50
- 533
- 760
-
7You should note that `x/2` and `x>>1` will compute different things if x is an odd negative number. If one cares about what a computation will do with odd negative numbers, one should use the form which yield the correct answer. – supercat Nov 01 '13 at 20:17
-
Also very fond of Hacker's Delight. Unfortunately, the companion web site is no longer available. Interested persons can use the [2019-07-17 capture](https://web.archive.org/web/20190717132328/http://www.hackersdelight.org:80/), which appears to be the final complete capture. – Chris W. Johnson Mar 02 '21 at 15:47
Humans are wrong in these cases.
99% when they try to second guess a modern (and all future) compilers.
99.9% when they try to second guess modern (and all future) JITs at the same time.
99.999% when they try to second guess modern (and all future) CPU optimizations.
Program in a way that accurately describes what you want to accomplish, not how to do it. Future versions of the JIT, VM, compiler, and CPU can all be independantly improved and optimized. If you specify something so tiny and specific, you lose the benefit of all future optimizations.

- 30,738
- 21
- 105
- 131

- 612
- 4
- 7
You can almost certainly depend on the literal-power-of-two multiplication optimisation to a shift operation. This is one of the first optimisations that students of compiler construction will learn. :)
However, I don't think there's any guarantee for this. Your source code should reflect your intent, rather than trying to tell the optimiser what to do. If you're making a quantity larger, use multiplication. If you're moving a bit field from one place to another (think RGB colour manipulation), use a shift operation. Either way, your source code will reflect what you are actually doing.

- 951,095
- 183
- 1,149
- 1,285
-
What if you're really interested in performance disregarding source code clearness. For example in a hash function... Will it help in .NET or Java? – Jorge Córdoba Jul 22 '09 at 23:23
-
2It's possible that such micro-optimisation may help for a *particular* JIT compiler. However, your code will be longer lived than the version of the JIT compiler on which it runs, and it's impossible to tell what optimisations the next version of the JIT compiler will perform. It's even possible that something you did to make it faster in the previous version has *worse* performance in the next version! This situation is very different from languages like C where the compile step is performed only once. JIT-compiled bytecode languages can increase in performance long after the code is written. – Greg Hewgill Jul 22 '09 at 23:31
Note that shifting down and division will (in Java, certainly) give different results for negative, odd numbers.
int a = -7;
System.out.println("Shift: "+(a >> 1));
System.out.println("Div: "+(a / 2));
Prints:
Shift: -4
Div: -3
Since Java doesn't have any unsigned numbers it's not really possible for a Java compiler to optimise this.

- 30,738
- 21
- 105
- 131

- 50,101
- 39
- 117
- 168
On computers I tested, integer divisions are 4 to 10 times slower than other operations.
When compilers may replace divisions by multiples of 2 and make you see no difference, divisions by not multiples of 2 are significantly slower.
For example, I have a (graphics) program with many many many divisions by 255. Actually my computation is :
r = (((top.R - bottom.R) * alpha + (bottom.R * 255)) * 0x8081) >> 23;
I can ensure that it is a lot faster than my previous computation :
r = ((top.R - bottom.R) * alpha + (bottom.R * 255)) / 255;
so no, compilers cannot do all the tricks of optimization.
I would ask "what are you doing that it would matter?". First design your code for readability and maintainability. The likelyhood that doing bit shifting verses standard multiplication will make a performance difference is EXTREMELY small.

- 11,666
- 5
- 47
- 58
It is hardware dependent. If we are talking micro-controller or i386, then shifting might be faster but, as several answers state, your compiler will usually do the optimization for you.
On modern (Pentium Pro and beyond) hardware the pipelining makes this totally irrelevant and straying from the beaten path usually means you loose a lot more optimizations than you can gain.
Micro optimizations are not only a waste of your time, they are also extremely difficult to get right.

- 263,252
- 30
- 330
- 514
According to the results of this microbenchmark, shifting is twice as fast as dividing (Oracle Java 1.7.0_72).

- 3,456
- 2
- 34
- 52
If the compiler (compile-time constant) or JIT (runtime constant) knows that the divisor or multiplicand is a power of two and integer arithmetic is being performed, it will convert it to a shift for you.

- 97,721
- 20
- 209
- 280
I am stunned as I just wrote this code and realized that shifting by one is actually slower than multiplying by 2!
(EDIT: changed the code to stop overflowing after Michael Myers' suggestion, but the results are the same! What is wrong here?)
import java.util.Date;
public class Test {
public static void main(String[] args) {
Date before = new Date();
for (int j = 1; j < 50000000; j++) {
int a = 1 ;
for (int i = 0; i< 10; i++){
a *=2;
}
}
Date after = new Date();
System.out.println("Multiplying " + (after.getTime()-before.getTime()) + " milliseconds");
before = new Date();
for (int j = 1; j < 50000000; j++) {
int a = 1 ;
for (int i = 0; i< 10; i++){
a = a << 1;
}
}
after = new Date();
System.out.println("Shifting " + (after.getTime()-before.getTime()) + " milliseconds");
}
}
The results are:
Multiplying 639 milliseconds
Shifting 718 milliseconds

- 30,738
- 21
- 105
- 131

- 11,476
- 16
- 65
- 104
-
1Can you benchmark it so that it doesn't overflow? If I remember right, overflowing could be expensive. – Michael Myers Jul 22 '09 at 21:53
-
3That's virtually identical for a single benchmark on a machine not specifically prepared for it. – Joel Coehoorn Jul 22 '09 at 21:56
-
2Have you tried running multiple times to see if the results are kept? – luiscubal Jul 22 '09 at 21:57
-
-
-
FYI, most of time: Multiplying 1219 milliseconds / Shifting 1219 milliseconds. Sometimes +/- 20 ms, but normal deviation – MicSim Jul 22 '09 at 22:12
-
weird...can more people test this and post their results? I am REALLY curious as to why this is happening. – Savvas Dalkitsis Jul 22 '09 at 22:23
-
Makes sense enough: The path is Java->ByteCode->i586/CISC->i586/RISC All those steps will be better optimized for the common * than the rare < – H H Jul 22 '09 at 22:35
-
1Continued: and the actual << or * will be executed in parallel with the for-loop j++/branch code, which will probably dominate. – H H Jul 22 '09 at 22:38
-
This may be naive--my knowledge of architecture is archaic at this point, but wouldn't multiply go to the math co-processor where it is crunched up out of band while the CPU starts on the next operation, whereas the >> is probably done directly in the ALU which takes CPU time? – Bill K Jul 22 '09 at 22:45
-
can it be "dead part of code",so it needs to print 'a' after cycle – Sergei Chicherin Oct 24 '11 at 13:28
-
2You're initializing new variables for the multiplication case, whereas you're re-using the date variables for the bit-shift case. There may be bookkeeping for GC there. Also, the bit shift might do better if you used "<<=" instead of "<<". That would make the bit shift case more analogous to the "*=" that you are using for the multiplication case. Now, you might think that the compiler would optimize that out, but I guess that depends upon the compiler. The multiplication is probably optimized to a bit shift, but the explicit assignment may not have been optimized to a shift in place. – Carl May 04 '12 at 07:19
-
I got Multiplying 1521 milliseconds / Shifting 1430 milliseconds for 500,000,000 operations on my Mac Eclipse, but I believe the for loop is the real time spender (addition is not much slower/faster than multiplication) – Infigon Jul 22 '22 at 14:13
Most compilers will turn multiplication and division into bit shifts when appropriate. It is one of the easiest optimizations to do. So, you should do what is more easily readable and appropriate for the given task.

- 39,494
- 39
- 114
- 146