When comparing Java with C++ for speed should I compile the C++ code with -O3 or -O2?

Question

I am writing a variety of equivalent programs in Java and C++ to compare the two languages for speed. Those programs employ heavy mathematical computations in a loop.

Interestingly enough I find that C++ beats Java when I use -O3. When I use -O2 Java beats C++.

Which g++ compiler optimization should I use to reach a conclusion about my comparisons?

I know this is not as simple to conclude as it sounds, but I would like to have some insights about latency/speed comparisons between Java and C++.

Well that depends on what you're trying to reach a conclusion *about*... what are you actually trying to determine? — Jon Skeet, Jul 05 '15 at 07:14
C++ doesn't support JIT compilation, which really proves itself in long running applications. I don't think there's a legit way to compare C++ with Java, seeing how thr JVM needs to warm up (which is where C++ could slam Java), and it can perform optimizations only possible by analyzing what's being executed at runtime (where Java slams C++ IMO) — Vince, Jul 05 '15 at 07:14
The effect of -O3 and -O2 depends VERY much on the actual code you are compiling (and of course, what compiler). Unfortunately, for very small benchmarks, Java will look much more favourable than for large benchmarks, since Java's "just in time" optimisations need to be fast (otherwise too much time is lost in optimising, compared to what is saved in runtime), meaning that complex optimisations that take a long time when run on large code does not get done. — Mats Petersson, Jul 05 '15 at 07:16
Also bear in mind, which is what Vince hints at, is that a lot of optimisation in Java is "profile driven optimisation" - meaning it understands what path the code "usually takes", and whether loops are "long or short", for example. — Mats Petersson, Jul 05 '15 at 07:18
@JonSkeet Hi John. I edited my question with _I know this is not as simple to conclude as it sounds, but I would like to have some insights about latency/speed comparisons between Java and C++._ — LatencyGuy, Jul 05 '15 at 16:53
You are not comparing languages, but specific implementations of the compilers/runtimes on specific examples. — Michel Billaud, Jul 05 '15 at 16:56
@VinceEmigh “C++ doesn't support JIT compilation, which really proves itself in long running applications” — [citation needed]. In fact, JIT compilation only benefits limited applications beyond the static optimisation performed by a C++ compiler. And of course C++ compilers can do profile-guided optimisations, which does much the same as a JIT. — Konrad Rudolph, Jul 05 '15 at 16:59
Maybe you can find answers on the Programmers network. Your question sounds interesting for that community I think. — Ely, Jul 05 '15 at 17:00
You still haven't really explained what you're trying to reach a conclusion about. If you're trying to compare the behaviour of Java with g++ with O2 optimizations, use that. If you're trying to compare the behaviour of Java with g++ with O3 optimizations, use that. If you're trying to compare Java with C++ in general, that's a fruitless task. — Jon Skeet, Jul 05 '15 at 17:03
I think the question does not make sense. Would you go on a race with your car and use flat tyres or pull the handbreak on start ? No. So you should give both languages the ideal setup. — Marged, Jul 05 '15 at 18:07

score 4 · Accepted Answer · edited May 23 '17 at 11:51

Interestingly enough I find that C++ beats Java when I use -O3. When I use -O2 Java beats C++.

-O3 will certainly beat -O2 in microbenchmarks but when you benchmark a more realistic application (such as a FIX engine) you will see that -O2 beats -O3 in terms of performance.

As far as I know, -O3 does a very good job compiling small and mathematical pieces of code, but for more realistic and larger applications it can actually be slower than -O2. By trying to aggressively optimize everything (i.e. inlining, vectorization, etc.), the compiler will produce huge binaries leading to cpu cache misses (i.e. especially instruction cache misses). That's one of the reasons the Hotspot JIT chooses not to optimize big methods and/or non-hot methods.

One important thing to notice is that JIT uses methods as independent units eligible for optimization. In your previous questions, you have the following code:

int iterations = stoi(argv[1]);
int load = stoi(argv[2]);

long long x = 0;

for(int i = 0; i < iterations; i++) {

    long start = get_nano_ts(); // START clock

    for(int j = 0; j < load; j++) {
        if (i % 4 == 0) {
            x += (i % 4) * (i % 8);
        } else {
            x -= (i % 16) * (i % 32);
        }
    }

    long end = get_nano_ts(); // STOP clock

    // (omitted for clarity)
}

cout << "My result: " << x << endl;

But this code is JIT-unfriendly because the hot block of code is not in its own method. For major JIT gains, you should have placed the block of code inside the loop on its own method. Your method executes a hot block of code instead of a hot method. The method that contains the for loop is probably called only once so the JIT will not do anything about it.

When comparing Java with C++ for speed should I compile the C++ code with -O3 or -O2?

Well, if you use -O3 for microbenchmarks you will get amazing fast results that will be unrealistic for larger and more complex applications. That's why I think the judges use -O2 instead of -O3. For example, our garbage-free Java FIX engine is faster than C++ FIX engines and I have no idea if they are compiling with -O0, -O1, -O2, -O3 or a mix of them through executable linking.

In theory it is possible for a person to selective compartmentalize an entire C++ application in executable pieces, choose which ones are going to be compiled with -O2 and which ones are going to be compiled with -O3. Then link everything in an ideal binary executable. But in reality, how feasible is that?

The approach the Hotspot chooses is much simpler. It says:

Listen, I am going to consider each method as an independent unit of execution instead of any block of code anywhere. If that method is hot enough (i.e. called often) and small enough I will try to aggressively optimize it.

That of course has the drawback of requiring code warmup but it is much simpler and produces the best results most of the time for realistic/large/complex applications.

And last but not least, you should probably consider this question if you want to compile your entire application with -O3: When can I confidently compile program with -O3?

Actually HotSpot doesn't necessarily use methods as the smallest unit, it can optimize loops. A rather stupid thing that complicates the whole architecture and really only profits microbenchmarks, but alas marketing. [on-stack replacement](http://www.azulsystems.com/blog/cliff/2011-11-22-what-the-heck-is-osr-and-why-is-it-bad-or-good). An excellent answer and this is nothing but nitpicking - interesting to some I hope though. — Voo, Jul 05 '15 at 19:09
@Voo We got major improvements when extracted loop blocks to their own methods, at least in Java7. Could it be that the HotSpot optimize methods more aggressively? — rdalmeida, Jul 05 '15 at 19:18
Definitely! OSR can even be detrimental in some situations as the linked article explains much better than I ever could. It's a complicated topic, but having many smaller methods is generally a great idea not just for performance and the downsides are rare. — Voo, Jul 05 '15 at 22:18

score 0 · Answer 2 · answered Jul 06 '15 at 02:26

If possible, compare it against both, since -O2 and -O3 are both options available to the C++ developer. Sometimes -O2 will win. Sometimes -O3 will win. If you have both available, that's just more information which can be used to support whatever you're trying to accomplish by doing these speed comparisons.

When comparing Java with C++ for speed should I compile the C++ code with -O3 or -O2?

2 Answers2