1

The following small programs which compute the sum of all numbers from 1 to 1 billion we're written in C++ and Java as closely as I could write them. My understanding is that C++ is the "faster" language, but the java version of this code completes in ~.5 seconds vs ~3 seconds for C++.

C++ (GCC Compiler):

int main(){
    long long x = 0;
    for (long i=0;i<1000000001;i++){
    x=x+i;
    }
    cout << x << endl;
    return 0;
}

JAVA:

public class Main {
    public static void main(String[] args)  {
        long x=0;
        for (long i=0;i<1000000001;i++){
            x=x+i;
        }
        System.out.println(x);

    }

}

How would one optimize the C++ code to be as fast as the JAVA version? Is it even possible?

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Brandon Ramos
  • 47
  • 1
  • 6
  • 10
    Did you turn on optimizations when you compiled the C++ code? – NathanOliver Sep 11 '17 at 19:52
  • 7
    `cout << (1000000001)*(1000000002)/2 << endl;` – 463035818_is_not_an_ai Sep 11 '17 at 19:55
  • I have not, I'm brand new to C++ (previous experience in Java and Python only) and don't know much about compilers. – Brandon Ramos Sep 11 '17 at 19:57
  • @BrandonRamos What are you using to compile the code? – NathanOliver Sep 11 '17 at 19:58
  • 5
    Neither language has any inherent speed advantages over the other. Different _implementations_ of either language may be faster or slower, and the answer might also depend on implementation-specific compile-time and/or run-time options that you might choose. Back when Java was new, the _only_ real Java implementation was slower than pretty much _any_ C implementation on the same hardware, but that was decades ago. – Solomon Slow Sep 11 '17 at 19:58
  • @tobi303 That's like suggesting that people running a 400m race should just run in a straight line from their starting positions to the finish line. – Dawood ibn Kareem Sep 11 '17 at 19:59
  • @NathanOliver CodeBlocks IDE and GCC compiler. – Brandon Ramos Sep 11 '17 at 19:59
  • @DawoodibnKareem, no, it's more like suggesting that the runners should just teleport from the start to the finish. – Solomon Slow Sep 11 '17 at 19:59
  • 2
    @DawoodibnKareem if the task is to get to the finish line as efficient as possible, thats what I would do – 463035818_is_not_an_ai Sep 11 '17 at 20:00
  • 1
    @BrandonRamos If you have a build target drop down in the menu bars and if it is set to debug change it to release. – NathanOliver Sep 11 '17 at 20:01
  • But the task is to measure the relative speeds of two different implementations. Teleportation doesn't help with the measurement. – Dawood ibn Kareem Sep 11 '17 at 20:01
  • @NathanOliver Thanks, that sped it up quite a bit. Down to ~.1 seconds. Does changing it to "release" automatically flag it for optimizations for speed? – Brandon Ramos Sep 11 '17 at 20:04
  • 1
    @BrandonRamos Yes. It turns on a bunch of optimization flags. (well normally one that encompasses a whole group of them). – NathanOliver Sep 11 '17 at 20:04
  • 3
    Obligatory: [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103) – Jorn Vernee Sep 11 '17 at 20:07
  • 3
    Except that optimizing compilers _can_ identify and handle certain loops. So it's more like suggesting that racers in Star Trek who are allowed to use the teleporter, should do so. And they should. – yshavit Sep 11 '17 at 20:08
  • 1
    @DawoodibnKareem it is a bit far fetched, but it is imaginable, that a compiler would detect this pattern and turn it into something similar to what I suggested. Anyhow, imho when comparing languages one should compare how easy it is to express solutions and not how fast it can make the cpu produce heat, so my first comment wasnt only for fun... – 463035818_is_not_an_ai Sep 11 '17 at 20:09
  • @jameslarge Actually, this is not true. While you are correct that the speed of implementations differ, and some Java implementations will be faster than some C++ implementations, C++ is inherently a faster language. This is because it gives the programmer the facility to optimally manipulate memory, which is a detail that Java hides. Whether or not the optimal implementation is actually achievable by a human writing C++... well that's another question! – sdgfsdh Sep 11 '17 at 20:59

3 Answers3

8

This question is a perfect example of what not to do. The whole loop is equivalent to a single assignment and any optimizing compiler knows it. So you're measuring how long it takes to start the program and output a line.

Then Java must lose by any factor you wish as running the Java code includes starting the JVM and that's pretty slow. Moreover, it includes the optimizing compilation. What javac did is just the compilation from Java source to Java bytecode and there's no attempt to optimize anything. All the optimizations happen at runtime (bytecode to machine code). 1

So we can conclude that Java is terribly slow for any task taking less than a few seconds. You can get a factor of 20 or infinity (division by zero), if you try hard enough.

The more important conclusion is that it makes no sense. See How do I write a correct micro-benchmark in Java?, if you want a meaningful result.


1 This holds for desktop Java. On Android, it's different.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
5

If you compile with optimizations, then the C++ version is considerably faster.

Java:

javac Main.java

$ time java Main
500000000500000000

real    0m0.727s
user    0m0.724s
sys     0m0.004s

C++:

clang -O3 main.cpp -o cpp

$ time ./cpp 
500000000500000000

real    0m0.003s
user    0m0.000s
sys     0m0.000s

My Clang version:

$ clang --version
clang version 4.0.0-1ubuntu1 (tags/RELEASE_400/rc1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

My Java version:

$ javac -version
javac 1.8.0_144

The reason for this is that optimization is a slow process; you get quicker compilation times if you turn optimizations off. This is better for development, so this is the defaults that the Clang developers chose. Java is probably faster because it does more optimizations at run-time. JVM bytecode is not that different to the source-code it compiled from!

sdgfsdh
  • 33,689
  • 26
  • 132
  • 245
  • 1
    JVM bytecode is about as relevant for the speed as the source code. No contemporary java source to java bytecode compiler even attempts to make *any* optimizations; they all happen at runtime. Your 0.727 s are partly the time for the JVM start and partly the compilation time (bytecode to machine code). OK, some execution time is there, too. – maaartinus Sep 13 '17 at 12:56
  • ...and I strongly suspect that clang also converted the loop to a constant at compile time (like gcc did in [this answer](https://stackoverflow.com/a/46163584/2513200)) – Hulk Sep 15 '17 at 09:19
  • @Hulk Almost certainly... but isn't that the point? C++ provides some great tooling so that developers don't have to worry about spotting these optimizations. – sdgfsdh Sep 15 '17 at 09:21
  • 1
    @sdgfsdh yes, and Java's JIT compiler may be able to dynamically optimize for the data it actually receives at runtime - the only thing to learn here is that this benchmark does not measure what the OP expected it to measure. – Hulk Sep 15 '17 at 09:27
  • Of course, but the question is are we testing a long-lived application or a short one (as described by the OP)? C++ clearly has better (mainstream) tooling for the short-lived case. – sdgfsdh Sep 15 '17 at 09:29
  • 2
    @sdgfsdh Sure, that is definitely true – Hulk Sep 15 '17 at 09:31
3

Compile the C code with the -O option.

Assembly generated without -O contains lots of memory access (slow):

main:
  push rbp
  mov rbp, rsp
  mov QWORD PTR [rbp-8], 0
  mov QWORD PTR [rbp-16], 0
.L3:
  cmp QWORD PTR [rbp-16], 1000000000
  jg .L2
  mov rax, QWORD PTR [rbp-16]
  add QWORD PTR [rbp-8], rax
  add QWORD PTR [rbp-16], 1
  jmp .L3
.L2:

Assembly generated with -O only uses registers:

main:
  mov eax, 1000000001
.L2:
  sub rax, 1
  jne .L2

See Godbolt's GCC explorer output: https://godbolt.org/g/rx1Va4

EDIT: In the optimized mode, the compiler recognizes that the output is a constant, that's why there is no add instruction. See Nathan's example with output: https://godbolt.org/g/r1PxvL

Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51
  • 1
    Your example is a little flawed as you do not do the output. That means the compiler just optimizes out the loop since it doesn't need to calculate a value in the second version. What is cool though is if you add the print out of `x` you can see gcc just calculates the value at compile time and inserts a constant into the assembly: https://godbolt.org/g/r1PxvL – NathanOliver Sep 11 '17 at 20:16
  • 1
    @NathanOliver yeah teleportation wins :) – 463035818_is_not_an_ai Sep 11 '17 at 20:18