Java Math.tanh() performance

Question

I have a Java program that does many calls to the Math.tanh() function. Out of curiosity I wanted to do a comparison with C++. Therefore I wrote two small programs, one in Java and one in C++, to test.

The Java code:

public class TestTanh { 

    public static void main(String[] args) {

        double t1 = -1.0;
        double t2 = 1.0;
        double step = 1e-8;

        double z = 0.0;
        for(double t=t1; t<=t2; t += step) {
            double y = Math.tanh(t);
            z += y;
        }
        System.out.println("Sum = " + z);
    }
}

and the C++ code:

#include <iostream>
#include <cmath>

using namespace std;

int main() {

    double t1 = -1.0;
    double t2 = 1.0;
    double step = 1e-8;

    double z = 0.0;
    for(double t=t1; t<=t2; t += step) {
        double y = tanh(t);
        z += y;
    }
    cout << "Sum = " << z << "\n";
}

Compiling and running the programs I got the following:

$ time java TestTanh
Sum = -0.41281032759865655

real    0m18.372s
user    0m17.961s
sys     0m0.109s

and

$ time ./test_tanh
Sum = -0.41281

real    0m4.022s
user    0m3.641s
sys     0m0.004s

Why does the Java program take about 5 times more time to execute? Could it be related to the JIT doing some compiling first? Or is the tanh implementation in Java slower than the C++?

It is a simple test that might have a trivial explanation but I have searched the web and not found an answer. My Java version is

$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)

When using the tanh function in a larger program containing other basic arithmetic operations the difference between Java and C++ became smaller (now about 2.3). The program still calls tanh repeatedly, but now there are also other operations in the loop. I also tried the FastMath class from Apache Commons, but is was actually slower (any special settings needed?). The result for this program with identical parameters were:

C++

real    0m18.031s
user    0m18.007s
sys     0m0.007s

Java with lang.Math

real    0m40.739s
user    0m40.032s
sys     0m0.088s

Java with org.apache.commons.math.util.FastMath

real    0m46.717s
user    0m46.583s
sys     0m0.372s

My goal here was not to do any true benchmarking, I just wanted to see what the differences were in a practical situation when implementing the code in a straightforward way.

There are clearly slight implementations differences here since you're not getting identical results in the two languages (unless I don't know how printing a double works in C++, which is also possible). Perhaps Java has stricter (thus more expensive) requirements than the C library? — Mark Peters, Mar 01 '11 at 21:55
Can you try C++ with `cout << setprecision(17) << foo << endl;`? — Mark Peters, Mar 01 '11 at 22:00
You should check out Nailgun, a client/server setup that eliminates the overhead of starting the JVM. http://martiansoftware.com/nailgun/ — Peter C, Mar 01 '11 at 22:04
With `cout << setprecision(17)` I get **C++ : Sum = -0.4128103271477388** **Java: Sum = -0.41281032759865655** So there is a difference in the 10th digit. Which is most correct I don't know. — Peter, Mar 03 '11 at 13:23
Actually the sum should ideally be zero, as tanh is antisymmetric around zero. But with all these small steps any small errors in both the step variable `t`and tanh itself will sum up. I should probably also use something like `t<(t2+step/2)` as the upper limit. But this is not really related to the original performance issue. — Peter, Mar 03 '11 at 13:42

score 3 · Answer 1 · answered Mar 01 '11 at 22:10

3

According to this, OpenJDK 6 (and I guess, Sun's JDK 6) uses strict math, which sacrifices performance for correctness. That might be your problem. I am pretty sure that no decent JVM spends 18 seconds in starting. You should use a Math library with performance in mind, or change your JVM.

answered Mar 01 '11 at 22:10

gpeche

21,974
5
38
51

Do you have a suggestion for another Math library? – Peter Mar 02 '11 at 09:33

score 2 · Answer 2 · answered Mar 01 '11 at 22:08

2

It may or may not come from the fact, that the result in Java is quite precisely defined. The following can cost time unless the CPU does it in exactly the same way:

If the argument is zero, then the result is a zero with the same sign as the argument.

If the argument is positive infinity, then the result is +1.0.

The computed result must be within 2.5 ulps of the exact result.

In C, you only know it comes something like tanh(x) out. There are standards, and there are standard conforming compilers, but do you use such a compiler?

As said in the other answers, this is not the way how benchmarks should be done. For comparison of Java programs only I recommend you to grab caliper and try with it.

answered Mar 01 '11 at 22:08

maaartinus

44,714
32
161
320

Yes, thanks, I realize that the code I wrote surely is not the way to do a real benchmark. But then again I was surprised by the factor 5 difference, I was expecting (hoping) something less than 2. I run a Mac OS X 10.6.6. The C++ code was compiled without any flags (`i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1)`). The Java code was also compiled without flags (`javac 1.6.0_22`). I will check out caliper. – Peter Mar 02 '11 at 08:01
I am surprised, too. I know there are issues regarding goniometric functions (because of the FPU computing nonsense for big arguments), but tanh is quite a normal function. Try some more functions like exp and sqrt, I'm really curious. – maaartinus Mar 02 '11 at 14:19

score 1 · Answer 3 · answered Mar 01 '11 at 21:59

When doing performance tests like this, you should always allow for "warmup period". So you shouldn't start measuring before having run the calculations a couple of hundred or thounsand times. This way the Java Hotspot compiler will have compiled what it considers being frequently executed code, and the native binary will have put its most frequently used variables in processor registers.

My guess is that close to 100% of the difference in your result is due to the slow startup time of the JVM. Java uses ages to start compared to a natively compiled program.

It would be interesting to see a measurement actually done in code, after a "warmup period".

That was my initial guess too. But I did also measure time as suggested by @japher and @MarkPeters and there was not much difference. I also started the timer after half of the looping and the time was about half. — Peter, Mar 02 '11 at 08:08

score 1 · Answer 4 · answered Mar 20 '11 at 14:59

For most C++ compilers, tanh is an intrinsic function (a built-in function), meaning there is no function call overhead because the compiler adds specific assembler instructions instead of a library call.

Some Java VMs also support intrinsic functions, for example String.length() seems to be intrinsic in the Sun JVM. For Java, that means the hotspot compiler replaces the function call with special assembler instructions (at runtime). This is a bit different from C and C++ where the compiler does that (before running the program).

However, in Java, Math.tanh doesn't seem to be intrinsic. Therefore, it's slower.

score 0 · Answer 5 · edited May 23 '17 at 11:55

The main method is called only once, so the JVM might not compile it to native code. Read first How do I write a correct micro-benchmark in Java?

If there is still a big difference after the micro benchmark is well written, then a possible reason is JNI overhead. The Math.tanh() method and others have been implemented in the JVM as native code. The documentation for java.lang.StrictMath says that they use the fdlibm library, which is written in C, so you might do well to use that library in your tests, so that you would not be comparing two different C libraries.

Java Math.tanh() performance

5 Answers5