I wrote a small benchmark, in which the program creates 108 2-Dimensional std::vector
structures of {float, float}
, and then sums up the square of their lengths.
Here is the C++ code:
#include <iostream>
#include <chrono>
#include <vector>
#include <array>
#include <cmath>
using namespace std;
using namespace std::chrono;
const int COUNT = pow(10, 8);
class Vec {
public:
float x, y;
Vec() {}
Vec(float x, float y) : x(x), y(y) {}
float len() {
return x * x + y * y;
}
};
int main() {
vector <Vec> vecs;
for(int i = 0; i < COUNT; ++i) {
vecs.emplace_back(i / 3, i / 5);
}
auto start = high_resolution_clock::now();
// This loop is timed
float sum = 0;
for(int i = 0; i < COUNT; ++i) {
sum += vecs[i].len();
}
auto stop = high_resolution_clock::now();
cout << "finished in " << duration_cast <milliseconds> (stop - start).count()
<< " milliseconds" << endl;
cout << "result: " << sum << endl;
return 0;
}
For which I used this makefile (g++ version 7.5.0):
build:
g++ -std=c++17 -O3 main.cpp -o program #-ffast-math
run: build
clear
./program
Here is my Java code:
public class MainClass {
static final int COUNT = (int) Math.pow(10, 8);
static class Vec {
float x, y;
Vec(float x, float y) {
this.x = x;
this.y = y;
}
float len() {
return x * x + y * y;
}
}
public static void main(String[] args) throws InterruptedException {
Vec[] vecs = new Vec[COUNT];
for (int i = 0; i < COUNT; ++i) {
vecs[i] = new Vec(i / 3, i / 5);
}
long start = System.nanoTime();
// This loop is timed
float sum = 0;
for (int i = 0; i < COUNT; ++i) {
sum += vecs[i].len();
}
long duration = System.nanoTime() - start;
System.out.println("finished in " + duration / 1000000 + " milliseconds");
System.out.println("result: " + sum);
}
}
Which was compiled and ran using Java 11.0.4
And here are the results (the average of a few runs, ran on ubuntu 18.04 16bit):
c++: 262 ms
java: 230 ms
Here are a few things I have tried in order to make the c++ code faster:
- Use
std::array
instead ofstd::vector
- Use a plain array instead of
std::vector
- Use an iterator in the
for
loop
However, none of the above resulted in any improvement.
I have noticed a few interesting things:
- When I time the whole
main()
function (allocation + computation), C++ is much better. However, this might be due to the warm up time of the JVM. - For a lower number of objects, like 107, C++ was slightly faster (by a few milliseconds).
- Turning on
-ffast-math
makes the C++ program a few times faster than Java, however the result of the computation is slightly different. Furthermore, I read in a few posts that using this flag is unsafe.
Can I somehow modify my C++ code and make it as fast or faster than Java in this case?