benchmark of simple math functions: why is Fortran and Julia faster than C

Question

There are various arguments that in some cases, Fortran can be faster than C, for example when it comes to aliasing and I often heard that it does better auto-vectorization than C (see here for some good discussion).

However, for simple functions as calculation the Fibonaci number and the Mandelbrot at some complex number with straight-forward solutions without any tricks and extra hints/keywords to the compiler, I would have expected that they really perform the same.

C implementation:

int fib(int n) {
    return n < 2 ? n : fib(n-1) + fib(n-2);
}

int mandel(double complex z) {
    int maxiter = 80;
    double complex c = z;
    for (int n=0; n<maxiter; ++n) {
        if (cabs(z) > 2.0) {
            return n;
        }
        z = z*z+c;
    }
    return maxiter;
}

Fortran implementation:

integer, parameter :: dp=kind(0.d0)          ! double precision

integer recursive function fib(n) result(r)
integer, intent(in) :: n
if (n < 2) then
    r = n
else
    r = fib(n-1) + fib(n-2)
end if
end function

integer function mandel(z0) result(r)
complex(dp), intent(in) :: z0
complex(dp) :: c, z
integer :: n, maxiter
maxiter = 80
z = z0
c = z0
do n = 1, maxiter
    if (abs(z) > 2) then
        r = n-1
        return
    end if
    z = z**2 + c
end do
r = maxiter
end function

Julia implementation:

fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)

function mandel(z)
    c = z
    maxiter = 80
    for n = 1:maxiter
        if abs(z) > 2
            return n-1
        end
        z = z^2 + c
    end
    return maxiter
end

(The full code including other benchmark functions can be found here.)

According to the Julia homepage, Julia and Fortran (with -O3) perform better than C (with -O3) on these two functions.

How can that be?

@HighPerformanceMark: I cannot really reproduce. But that is atm because of [this](http://stackoverflow.com/questions/20001184/static-pre-calculation-optimization-in-clang). — Albert, Nov 15 '13 at 13:12
Ok here https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.f90 — Vladimir F Героям слава, Nov 15 '13 at 13:12
I am usually wary of statements that read, "*taking best timing from all optimization levels*". Seems to me that you ought to take either (1) average of the fastest *n* run-times or (2) average of all run-times. Also, that method of computing Fibonacci numbers is horrendously slow, one should use the [matrix method](http://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form) or its related memoized form. — Kyle Kanos, Nov 15 '13 at 13:56
@KyleKanos: For a purely CPU bound benchmark test, I don't agree. The minimum/best time is the closest you can get to the real value of execution time. About the implementation: It is especially the point that it is the straight-forward implementation, not a clever one. — Albert, Nov 15 '13 at 14:12
@Albert: You are unlikely to *always* get the best time when you are actually running your code (not these "benchmarks"). Therefore, it's better to take an average of a set of numbers that, when combined, more accurately reflect the *expected* run-times. — Kyle Kanos, Nov 15 '13 at 14:19

score 8 · Accepted Answer · answered Nov 15 '13 at 18:00

Honestly, I wouldn't take these differences too seriously. Different C compilers will give different results too. Try running the C microbenchmarks using GCC and Clang and you will get almost as much difference as C vs. Fortran. Why is GCC sometimes faster than Clang and sometimes not? They just do different optimizations and code generation in different ways. The relative performance is also different on different hardware since it may depend on exact numbers of registers, cache sizes, degree of superscalar throughput, relative speed of various instructions, etc.

It is curious that Fortran is so much faster for the fib benchmark, so if someone figures that one out and posts an answer here, I'll gladly upvote it, but the ≤ 15% difference on the mandel and other benchmarks is just not all that remarkable. The most mysterious thing to me about these benchmarks is why Fortran is so slow at integer parsing. I suspect it's because that code is doing something dumb, but I'm not a Fortran coder so I'm not sure what should be improved. If anyone reading this is a Fortran pro and wants to take a look at this code, it would be greatly appreciated. I suspect that Fortran being 5x slower than C is just wrong.

One thing to note is that in collating these benchmark results, we reject times that are zero to avoid counting cases where the compiler just constant-folded the entire computation. On some optimization levels that is exactly what both the C and Fortran compilers do and it's pretty difficult to force them not to do this, short of using a lower optimization level. If someone wants to figure out how to force the compilers not to constant fold these results while still fully optimizing benchmark code, that would be a welcomed contribution. (One possible approach is to compile the benchmark functions as a shared library using full optimizations and then link that into the main program with link-time optimizations turned off. It's tricky but it might work.)

Ultimately, worrying too much about exact microbenchmark numbers is missing the bigger picture. The point of these benchmarks is that some languages have reliably fast standard implementations – like C, Fortran, Julia and Go – while other languages don't. In the slow languages, you sometimes have to resort to using a different language to get the performance you need, while in the reliably fast languages, you never have to do that. That's really all this is about. The exact relative performance of fast languages is an arms race: one language may pull ahead sometimes, but the others will always be close behind – the key thing is that they're in the race at all.

New issue created to track trying to prevent constant folding in C and Fortran without reducing the optimization level: https://github.com/JuliaLang/julia/issues/4821. — StefanKarpinski, Nov 15 '13 at 18:31
In checking the optimized assembler output (`-S` flag), C's function takes up about 70 lines while gfortran's uses 115 (I'm using 4.4.7, so maybe some differences)? Seems like the `ichar` function is an expensive call. — Kyle Kanos, Nov 15 '13 at 18:43
Everything you say also holds true for the micro-benchmarks in Julia, i.e. the JIT LLVM compiled Julia code, don't they? Esp. in the case of `mandel`, I don't see how the Julia JIT LLVM compiler could do anything more clever than a C compiler. — Albert, Nov 15 '13 at 20:06
Btw., I always have that constant-folding case for `fib` in C, compiled with Clang. I have even asked about that [here](http://stackoverflow.com/questions/20001184/static-pre-calculation-optimization-in-clang). — Albert, Nov 15 '13 at 20:08

benchmark of simple math functions: why is Fortran and Julia faster than C

1 Answers1