I do not want to start yet another pointless flamewar about whether Java or C++ is the better language in general. I want to know whether a comparison I did for a specific task is fair and the measured data correct.
We need to decide whether to use Java or C++ for our next project. I'm in the C++ camp but I want to have solid arguments for my case. Our application is a special and has the following needs:
- The program must run reasonably fast and be reasonably memory efficient. We do not care about the last 20% performance. However, a 10x performance difference is a show stopper.
- We have lots of arrays. We do not know their size upfront. It is therefore important that arrays can grow at the back in amortized O(1) running time.
- The elements in the arrays consist of a small number of basic data types. The typical example is a tuple of integers or floats.
- The arrays can get large. 10^6 elements are standard. We have applications with 10^7 elements and supporting 10^8 would be great.
I implemented a toy program in C++ and in Java. First, I present the C++ version:
#include <iostream>
#include <vector>
#include <cstdlib>
using namespace std;
struct Point{
float x, y;
};
int main(int argc, char*argv[]){
int n = atoi(argv[1]);
vector<Point>arr;
for(int i=0; i<n; ++i){
Point p;
p.x = i;
p.y = i+0.5f;
arr.push_back(p);
}
float dotp = 0;
for(int i=0; i<n; ++i)
dotp += arr[i].x * arr[i].y;
cout << dotp << endl;
}
Next is the Java version that does the same thing:
import java.util.*;
class Point{
public float x, y;
}
class Main{
static public void main(String[]args){
int n = Integer.parseInt(args[0]);
ArrayList<Point> arr = new ArrayList<Point>();
for(int i=0; i<n; ++i){
Point p = new Point();
p.x = i;
p.y = i+0.5f;
arr.add(p);
}
float dotp = 0;
for(int i=0; i<n; ++i)
dotp += arr.get(i).x * arr.get(i).y;
System.out.println(dotp);
}
}
I pass the number of elements using the commandline to the program to prevent the optimizer from executing the program during compilation. The computed value is not useful. The only interesting question is how fast the programs run and how much memory they use. I start with C++:
$ g++ --version
g++ (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
$ g++ -O3 test.cpp -o test
$ /usr/bin/time ./test 1000000
3.33381e+17
0.01user 0.00system 0:00.02elapsed 100%CPU (0avgtext+0avgdata 10084maxresident)k
0inputs+0outputs (0major+2348minor)pagefaults 0swaps
$ /usr/bin/time ./test 10000000
3.36984e+20
0.08user 0.01system 0:00.09elapsed 100%CPU (0avgtext+0avgdata 134380maxresident)k
0inputs+0outputs (0major+4074minor)pagefaults 0swaps
$ /usr/bin/time ./test 100000000
2.42876e+23
0.77user 0.09system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 1050400maxresident)k
0inputs+0outputs (0major+6540minor)pagefaults 0swaps
The "user" time is how long the program ran. For 10^6 elements, it ran for 0.01 sec, for 10^7 elements 0.08 sec, and for 10^8 elements 0.77 sec. "maxresident" is the amount of physically memory in kilobyte that the kernel gave the program. For 10^6 its 10 MB, for 10^7 its 132 MB, and for 10^8 its 1 GB.
The memory consumption sounds right. An array with x elements needs sizeof(float)*2*x=8*x bytes of memory. For 10^6 elements that is about 8MB, for 10^7 about 76MB, and for 10^8 about 762 MB.
Next, I run the Java program:
$ javac -version
javac 1.6.0_41
$ javac Main.java
$ java -version
java version "1.7.0_131"
OpenJDK Runtime Environment (IcedTea 2.6.9) (7u131-2.6.9-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.131-b00, mixed mode)
$ /usr/bin/time java Main 1000000
3.33381168E17
0.16user 0.00system 0:00.09elapsed 173%CPU (0avgtext+0avgdata 79828maxresident)k
0inputs+64outputs (0major+4314minor)pagefaults 0swaps
$ /usr/bin/time java Main 10000000
3.3698438E20
5.23user 0.18system 0:02.07elapsed 261%CPU (0avgtext+0avgdata 424180maxresident)k
0inputs+64outputs (0major+13508minor)pagefaults 0swaps
$ /usr/bin/time java Main 100000000
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at Main.main(Main.java:14)
Command exited with non-zero status 1
3840.72user 13.06system 17:11.79elapsed 373%CPU (0avgtext+0avgdata 2281416maxresident)k
0inputs+1408outputs (0major+139893minor)pagefaults 0swaps
For 10^6 elements it needs 0.16 sec and 78 MB. For 10^7 elements it needs 5.23 sec and 414 MB. I tried to run the program for 10^8 elements but Java crashed. It used all cores of my machine (in a sequential program!) and ran for 17 min while occupying 2.2GB. My machine has 8 GB of memory.
For 10^6 elements C++ is 0.16 / 0.01 = 16 times faster and needs 78/10 = 7.8 times less memory. For 10^7 elements C++ is 5.23/0.08 = 65 times faster and needs 414/132 = 3.1 times less memory. Java did not finish on the test instance with 10^8 elements while the C++ program finished within well below a second.
For 10^6 Java seems manageable but less than ideal. For 10^7 and 10^8 it is an absolute no-go. I expected a slight performance advantage of C++ over Java but not something this drastic.
The most likely explanation is that my testing methodology is wrong or that I have a non-obvious performance bottleneck in my Java code. Another explanation would be that the OpenJDK JVM significantly lacks behind JVMs from other vendors.
Please explain to me why Java performs so bad in this benchmark. How did I unintentionally make Java look worse than it is?
Thanks