I have a Java program that does many calls to the Math.tanh() function. Out of curiosity I wanted to do a comparison with C++. Therefore I wrote two small programs, one in Java and one in C++, to test.
The Java code:
public class TestTanh {
public static void main(String[] args) {
double t1 = -1.0;
double t2 = 1.0;
double step = 1e-8;
double z = 0.0;
for(double t=t1; t<=t2; t += step) {
double y = Math.tanh(t);
z += y;
}
System.out.println("Sum = " + z);
}
}
and the C++ code:
#include <iostream>
#include <cmath>
using namespace std;
int main() {
double t1 = -1.0;
double t2 = 1.0;
double step = 1e-8;
double z = 0.0;
for(double t=t1; t<=t2; t += step) {
double y = tanh(t);
z += y;
}
cout << "Sum = " << z << "\n";
}
Compiling and running the programs I got the following:
$ time java TestTanh
Sum = -0.41281032759865655
real 0m18.372s
user 0m17.961s
sys 0m0.109s
and
$ time ./test_tanh
Sum = -0.41281
real 0m4.022s
user 0m3.641s
sys 0m0.004s
Why does the Java program take about 5 times more time to execute? Could it be related to the JIT doing some compiling first? Or is the tanh implementation in Java slower than the C++?
It is a simple test that might have a trivial explanation but I have searched the web and not found an answer. My Java version is
$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
When using the tanh function in a larger program containing other basic arithmetic operations the difference between Java and C++ became smaller (now about 2.3). The program still calls tanh repeatedly, but now there are also other operations in the loop. I also tried the FastMath
class from Apache Commons, but is was actually slower (any special settings needed?). The result for this program with identical parameters were:
C++
real 0m18.031s
user 0m18.007s
sys 0m0.007s
Java with lang.Math
real 0m40.739s
user 0m40.032s
sys 0m0.088s
Java with org.apache.commons.math.util.FastMath
real 0m46.717s
user 0m46.583s
sys 0m0.372s
My goal here was not to do any true benchmarking, I just wanted to see what the differences were in a practical situation when implementing the code in a straightforward way.