1

Question #1

In Java, is shifting multiple times more expensive than using a single statement to shift the by the same number?

For example, is

int x = 5;
x = x << 16;

Faster than

int x = 5;
for (int i=0; i<16; ++i) {
    x = x << 1;
}

Further, what about

int x = 5;
for (int i=0; i<16; ++i) {
    x = x*2;
}

Edit: What is the precise performance of "x << 16"? Is it the same speed as "x << 1"?

Question #2

Is there a resource online that I can utilize to determine various bitwise operation performances in Java, so that I do not have to waste the time of StackOverflow users? :-)

Kirby
  • 3,649
  • 3
  • 26
  • 28
  • 2
    Doing multi-bit shift is almost always going to be an order of magnitude faster than using the one-bit loop. – Hot Licks Apr 25 '12 at 12:22
  • Plus, orders of magnitude faster than iterating a loop in Java... – Damon Apr 25 '12 at 12:23
  • (And, for 99.9% of uses, it's not worth even worrying about this sort of thing. Create one String and you've done 1000 times as much work.) – Hot Licks Apr 25 '12 at 12:23
  • Thanks **Hot Licks** and **Damon**. Can you elaborate on what exactly happens with a "multi-bit shift"? In particular, is it equivalent to a single statement? For example, is "x << 1" the same in performance as "x << 16"? – Kirby Apr 25 '12 at 12:24
  • 2
    The correct answer is that it's absolutely clear that using a single shift gives you unbeatable performance. It translates directly into a single machine instruction. As for the other idioms, you could still get lucky and have a JIT compiler realize on its own that the summa summarum of your loop is just that one shift. But why do it. – Marko Topolnik Apr 25 '12 at 12:24
  • The multiply is apt to be the worst of all, depending on implementation details. Or just as bad as the single shift. – Hot Licks Apr 25 '12 at 12:25
  • Thanks, **Marko**. Now with a "single shift", is the performance of "x << 1" equivalent to that of "x << 16"? – Kirby Apr 25 '12 at 12:25
  • 3
    Now I remembered that even at the CPU hardware level there is a dedicated circuit, so-called *barrel-shifter*, that makes a shift operation take a single clock tick. Introduced on the Intel 80386 in 1987. If you need a still more direct answer -- yes, `x << 1` is exactly the same as `x << anything else`. – Marko Topolnik Apr 25 '12 at 12:27
  • 2
    All "normal" computers have multi-bit shift operation that accomplishes a shift of N bits in one instruction. (On some crude RISC processors it may take 2-3 instructions, but anything takes 2-3 on those.) Long shifts may take a few more internal cycles than shorter shifts, but the difference would be hardly measurable. – Hot Licks Apr 25 '12 at 12:27
  • 1
    @Kirby `javap -c YourClass` might help next time you have doubts. – soulcheck Apr 25 '12 at 12:28
  • Thank you, **Marko** and **Hot Licks**. That's precisely what I wanted to know. – Kirby Apr 25 '12 at 12:28
  • Excellent, soulcheck! I had never heard of that before. – Kirby Apr 25 '12 at 12:30
  • 1
    @soulcheck -- javap just show the bytecodes, which is nowhere near revealing the actual cost of an operation. Some bytecodes translate into less than one machine instruction, others into thousands. – Hot Licks Apr 25 '12 at 12:32
  • @HotLicks it was a suggestion to the question from before the edit (pretty sure a loop containing something + `ishl` will get translated to at least the same number of instructions as single `ishl`) – soulcheck Apr 25 '12 at 15:15
  • A good name to know in optimization is Agner Fog: http://www.agner.org/optimize/ – DarenW Aug 20 '12 at 20:15

4 Answers4

5

...so that I do not have to waste the time of StackOverflow users?

You're wasting your own time too. Write a full prototype of your application, profile it, and then optimize it. I'm quite sure you'll find that the bottle neck is not due to the bit-shifting.

This smells premature optimization long way.

What is the precise performance of "x << 16"? Is it the same speed as "x << 1"?

Yes it's the same. But technically speaking it actually depends on the compiler, JVM implementation, JIT, CPU architecture,.. The Java Specification does not put any restrictions on execution times in such cases.

aioobe
  • 413,195
  • 112
  • 811
  • 826
  • 1
    If I have the option of doing something faster vs. doing something slower, why not know how to do it faster? – Kirby Apr 25 '12 at 12:27
  • 1
    @Kirby -- There is a certain "rightness" to learning a few of these tricks, but in general the shorter and more straight-forward a code sequence is, the more efficient it is. Clarity and efficiency usually go together (until you start looping, at least). – Hot Licks Apr 25 '12 at 12:29
  • @aioobe, readability is certainly important, but knowing what is actually happening in an application is important to me. – Kirby Apr 25 '12 at 12:32
  • @Hot Licks: Good point. I assumed this was the case, but am thankful for the explanations that have been given. – Kirby Apr 25 '12 at 12:33
2

In terms of basic logic, a single shift would give much greater performance.

When using the for loop version, for every iteration of the loop, the loop's terminating condition is checked, i is incremented, a bitwise operation is carried out and an assignment is made to x.

When using a single shift, a single bitwise operation is carried out and an assignment is made to x.

And as others have said, this really does seem like premature optimisation.

For the sake of answering your question though, logically, the first example is faster than the others.

That said, depending on the language and compiler, it is possible that the compiler would see that your for loop always runs 16 times and then proceeds to optimise your code, changing it to x << 16. If this is the case, you would see no difference between each code example you gave.

Liam George Betsworth
  • 18,373
  • 5
  • 39
  • 42
  • This obviously depends on how well the code is optimized. I could very well imagine that the JIT will unroll such loop into a straight sequence of shifts, which are then collapsed into a single one. Point is that you can't say how long something will take to execute by looking at the syntax on the Java level. – aioobe Apr 25 '12 at 12:39
  • Of course. In terms of basic logic though, without any compiler or language optimisation, I have described the flow of instructions. – Liam George Betsworth Apr 25 '12 at 12:41
  • 1
    Not that I disagree with your end answer, but your logic doesn't hold, since a shift by more than 1 could, logically (but not intelligently), be implemented internally as a loop of single shifts. To bring a more obvious example, `Collection.addAll` typically has the same performance as a loop of `Collection.add`, since the `addAll` itself just implements that loop. With your logic, the one operation should logically be faster than the loop. – yshavit Apr 25 '12 at 14:05
  • I can see where you're coming from, but it would be far less efficient to loop single shifts. I highly doubt that java has bit shifting implemented in this way. – Liam George Betsworth Apr 28 '12 at 11:47
1

why not write a simple benchmark and see it for yourself?

    long start1 = System.nanoTime();
    for (int k = 0; k < 100000000; k++) {
        int x = 5;
        x = x << 16;
    }
    long stop1 = System.nanoTime();

    long start2 = System.nanoTime();
    for (int k = 0; k < 100000000; k++) {
        int x = 5;
        for (int i = 0; i < 16; ++i) {
            x = x << 1;
        }
    }
    long stop2 = System.nanoTime();

    long start3 = System.nanoTime();
    for (int k = 0; k < 100000000; k++) {
        int x = 5;
        for (int i = 0; i < 16; ++i) {
            x = x * 2;
        }
    }
    long stop3 = System.nanoTime();

    System.out.println(stop1 - start1);
    System.out.println(stop2 - start2);
    System.out.println(stop3 - start3);
meliniak
  • 762
  • 1
  • 9
  • 16
  • 7
    A good reason not to "write a simple benchmark and see it for yourself" is that it's [*very, very easy* to get a microbenchmark **wrong**](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java). – Joachim Sauer Apr 25 '12 at 12:47
0

I would use the first.
But It's probably pretty same.
Maybe first is faster because I think JVM has a built in instruction for this, and as for the other, it has to read several instructions witch can be slower.

You shouldn't think a lot about these minor "Speed Improvement" things. The speed of those little arithmetic/logic operation is enormous and it wont impact the program performance much.

SmRndGuy
  • 1,719
  • 5
  • 30
  • 49