Poor performance of Java's Math.pow(x, 2) when x = 0

Question

Background

Having noticed that the execution of a java program I am working on was slower than expected, I decided to tinker with the area of code which I thought may be causing the issue - a call to Math.pow(x, 2) from within a for loop. Contrary to another questions on this site, a simple benchmark I created (code at end) found that replacing Math.pow(x, 2) with x*x actually sped the loop up by almost 70 times:

x*x: 5.139383ms
Math.pow(x, 2): 334.541166ms

Note that I am aware that the benchmark is not perfect, and that the values should certainly be taken with a pinch of salt - the aim of the benchmark was to get a ballpark figure.

The Question

While the benchmark gave interesting results it did not model my data accurately, as my data consists largely of 0's. Thus a more accurate test was to run the benchmark without the for loop marked optional. According to the javadoc for Math.pow()

If the first argument is positive zero and the second argument is greater than zero, or the first argument is positive infinity and the second argument is less than zero, then the result is positive zero.

So it would be expected that this benchmark would run faster right!? In reality however, this is considerably slower again:

x*x: 4.3490535ms
Math.pow(x, 2): 3082.1720006ms

Sure, one may expect the math.pow() code to run a little slower than the simple x*x code due to the fact that it needs to work for the general case, but 700x slower? What is going on!? And why is the 0 case so much slower than the Math.random() case?

UPDATE: Updated code and times based on @Stephen C's suggestion. This however made little difference.

Code used to benchmark

Note that reordering the two tests makes negligible difference.

public class Test {
    public Test(){
        int iterations = 100;
        double[] exampleData = new double[5000000];
        double[] test1Results = new double[iterations];
        double[] test2Results = new double[iterations];

        //Optional
        for (int i = 0; i < exampleData.length; i++) {
            exampleData[i] = Math.random();
        }

        for (int i = 0; i < iterations; i++) {
            test1Results[i] = test1(exampleData);
            test2Results[i] = test2(exampleData);
        }
        System.out.println("x*x: " + calculateAverage(test1Results) / 1000000 + "ms");
        System.out.println("Math.pow(x, 2): " + calculateAverage(test2Results) / 1000000 + "ms");
    }

    private long test1(double[] exampleData){
        double total = 0;
        long startTime;
        long endTime;
        startTime = System.nanoTime();
        for (int j = 0; j < exampleData.length; j++) {
            total += exampleData[j] * exampleData[j];
        }
        endTime = System.nanoTime();
        System.out.println(total);
        return endTime - startTime;
    }

    private long test2(double[] exampleData){
        double total = 0;
        long startTime;
        long endTime;
        startTime = System.nanoTime();
        for (int j = 0; j < exampleData.length; j++) {
            total += Math.pow(exampleData[j], 2);
        }
        endTime = System.nanoTime();
        System.out.println(total);
        return endTime - startTime;
    }

    private double calculateAverage(double[] array){
        double total = 0;
        for (int i = 0; i < array.length; i++) {
            total += array[i];
        }
        return total/array.length;
    }

    public static void main(String[] args){
        new Test();
    }
}

With `x * x` you avoid a method call which is quite expensive in contrast to a simple multiplication. — user, Dec 09 '15 at 21:58
Running your class with Oracle's JRE (v1.8), I don't see much difference between the `.pow(...)` call and `x*x`. The call to `.pow(...)` is slightly faster with random values, and slightly slower with all zeros. Also, you should make iterations at least 10000000 or more because 100 says very little. — Bart Kiers, Dec 09 '15 at 22:09
See also http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java — Alexis C., Dec 09 '15 at 22:10
I thought `Math.pow()` would be a fairly simple thing. Tracing down the code (OpenJDK 14) ended me up in `compute()` of `java.lang.FdLibm.Pow` which is a monster of a method... Doubles apparently can be very tricky with NaN's, Infinities, and what have you... I got here while translating a C-program (graphics handling) where they do tons of `x*x*x` and I was wondering if I shouldn't replace them with `pow`. Now I don't think so... (i.e. don't fix it if it ain't broke)... I think `Math.pow` is still handy for dealing with unknown doubles though... — Erk, Jan 02 '21 at 06:33

score 8 · Accepted Answer · edited May 23 '17 at 12:15

Though this is a bad benchmark, it luckily reveals an interesting effect.

The numbers indicate that you apparently run the benchmark under a "Client" VM. It has not very strong JIT-compiler (known as C1 compiler) that lacks many optimizations. No wonder that it does not work as good as one could expect.

Client VM is not smart enough to eliminate Math.pow call even if it has no side effects.
Moreover, it does not have a specialized fast path neither for Y=2 nor for X=0. At least, it did not have until Java 9. This has been recently fixed in JDK-8063086 and then further optimized in JDK-8132207.

But an interesting thing is that Math.pow is indeed slower for X=0 with C1 compiler!

But why? Because of implementation details.

x86 architecture does not provide a hardware instruction to compute X^Y. But there are other useful instructions:

FYL2X computes Y * log₂X
F2XM1 computes 2^X - 1

Thereby, X^Y = 2^(Y * log₂X). Since log₂X is defined only for X > 0, FYL2X ends up with an exception for X=0 and returns -Inf. Thus it happens that X=0 is handled in a slow exceptional path rather than in a specialized fast path.

So what to do?

First of all, stop using Client VM, especially if you care about performance. Switch to the latest JDK 8 in 64-bit flavour, and you'll get the best of C2 optimizing JIT-compiler. And of course, it nicely handles Math.pow(x, 2) etc. Then write a correct benchmark using a proper tool like JMH.

martinez314 · Answer 2 · 2015-12-09T22:26:22.167

Possibly related to this regression in JDK 7: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8029302

From the bug report:

There is a Math.pow performance regression where power of 2 inputs run slower than other values.

I've replaced all calls to Math.pow() because of this:

public static double pow(final double a, final double b) {
    if (b == 2.0) {
        return (a * a);
    } else {
        return Math.pow(a, b);
    }
}

According to the bug report, it was fixed in JDK 8, which fits @BartKiers comment above.

score 3 · Answer 3 · edited May 23 '17 at 12:23

While the bug report that @whiskeyspider found is relevant, I don't think it is the entire explanation. According to the bug report, the regression gave a roughly 4x slowdown. But here we are seeing a roughly 1000x slowdown. The difference is too large to ignore.

I think that part of the problem we are seeing here is the benchmark itself. Look at this:

        for (int j = 0; j < exampleData.length; j++) {
            double output = exampleData[j] * exampleData[j];
        }

The statement in the body assigns to a local variable that is not used. It can be optimized away by the JIT compiler. (Indeed, the entire loop could be optimized away ... though empirically that doesn't appear to be happening here.)

By contrast:

        for (int j = 0; j < exampleData.length; j++) {
            double output = Math.pow(exampleData[j], 2);
        }

Unless the JIT compiler knows that pow is side-effect free, that cannot be optimized. Since the implementation of pow is in native code, that knowledge would have to be imparted in the way that the method is made "intrinsic" ... under the hood. From the bug report analysis, changes in "intinsification" between different Java versions / releases are the root cause of the regression. My suspicion is that the flaw in the OP's benchmark is amplifying the effect.

The fix is to make sure that the output value is used, so that the JIT compiler can't optimize it away; e.g.

        double blackhole = 0;  // declared at start ...
        ...
        for (int j = 0; j < exampleData.length; j++) {
            blackhole += exampleData[j] * exampleData[j];
        }
        ...
        for (int j = 0; j < exampleData.length; j++) {
            blackhole += Math.pow(exampleData[j], 2);
        }

Reference: https://stackoverflow.com/a/513259/139985 ... especially Rule #6.

I'm now running this code on a different machine which is running java 8 and indeed making the changes you suggest bring the results much more inline. Will test on original java 7 machine later to decide whether to accept the answer. — Hungry, Dec 10 '15 at 09:42

score 0 · Answer 4 · answered Dec 09 '15 at 22:13

0

It takes much longer to use any method from the Math class, then to just use a simple operator (when possible). This is because the program has to pass the input of Math.method() to the Math class, the Math class will then do the operation, and then the Math class will return the value calculated from the Math.method(). All of this takes a lot more processing power than just using a basic operator like *, /, +, or -.

answered Dec 09 '15 at 22:13

E.Arrowood

750
7
17

If the method gets inlined or is intrinsic (which applies to most methods in the Math class) this is not true. – assylias Dec 09 '15 at 23:07

Poor performance of Java's Math.pow(x, 2) when x = 0

4 Answers4